CMU Researchers Create An AI Model That Can Detect The Pose of Multiple Humans In A Room Using Only The Signals From WiFi

Author Admin
Published February 16, 2023
0 comments Join the Conversation

Significant advancement in 2D and 3D human posture estimation using RGB cameras, LiDAR, and radars has been made possible by improvements in computer vision and machine learning algorithms. However, occlusion and lighting, prevalent in many exciting circumstances, negatively impact estimating human position from photographs. On the other hand, radar and LiDAR technologies demand expensive, power-hungry, specialized hardware. Furthermore, serious privacy considerations exist when using these sensors in private spaces.

Recent studies have looked at using WiFi antennas (1D sensors) for body segmentation and key-point body identification to overcome these constraints. The utilization of the WiFi signal in conjunction with deep learning architectures, which are frequently employed in computer vision, to estimate dense human pose correlation is further discussed in this article. In a study released by scientists at Carnegie Mellon University (CMU), they described DensePose from WiFi, an artificial intelligence (AI) model that can identify the pose of numerous people in space using just WiFi transmitter signals. At the 50% IOU threshold, the algorithm achieves an average precision of 87.2 in studies using real-world data.

Since WiFi signals are one-dimensional, most existing techniques for WiFi person detection can only pinpoint a person’s center of mass and frequently can only detect one person. Three different receivers recorded three WiFi signals, and the CMU method uses the amplitude and phase data from those signals. This generates a 3×3 feature map fed into a neural network that generates UV maps of human body surfaces and can locate and identify several persons’ poses.

🚨 Read Our Latest AI Newsletter🚨

The approach employs three elements to extract UV coordinates of the human body surface from WiFi signals: first, the unprocessed CSI signals are cleaned using amplitude and phase sanitization. Following domain translation from sanitized CSI samples to 2D feature maps that resemble images, a two-branch encoder-decoder network is used. The UV map, a representation of the dense relationship between 2D and 3D persons, is estimated using the 2D features after inputting a modified DensePose-RCNN architecture. The team uses transfer learning to reduce the discrepancies between the multi-level feature maps created by pictures and those produced by WiFi signals before training the leading network to optimize the training of WiFi-input networks.

A dataset of WiFi signals and video recordings of scenarios with one to five individuals was used to test the model’s performance. The recorded scenes were of offices and classrooms both. The researchers used pre-trained DensePose models to the movies to produce faux ground truth, although there are no annotations on the video to serve as the evaluation’s ground truth. Overall, the model was only “successfully able to recognize the approximate locations of human boundary boxes” and the pose of torsos.

The group identified two primary categories of failure cases.

(1) The WiFi-based model is biased and is likely to create faulty body parts when body positions are infrequently seen in the training set.

(2) Extracting precise information for each subject from the amplitude and phase tensors of the entire capture is more difficult for the WiFi-based approach when there are three or more contemporary subjects in one capture.

Researchers think that gathering more comprehensive training data will help to solve both of these problems.

The work’s performance is still constrained by the available training data in WiFi-based perception, particularly when considering various layouts. In their upcoming research, scientists also intend to gather data from multiple layouts and advance their efforts to forecast 3D human body shapes from WiFi signals. Compared to RGB cameras and Lidars, the WiFi device’s enhanced capabilities of dense perception might make it a more affordable, illumination-invariant, and private human sensor.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 14k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.

Source link

Admin

Next Post

Meet 'Kani-TTS-2': A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support

Microsoft AI Releases VibeVoice-Realtime: A Lightweight Real‑Time Text-to-Speech Model Supporting Streaming Text Input and Robust Long-Form Speech Generation

Leave a Reply Cancel reply