Recent advancements in deep learning have significantly propelled the field of computer vision, especially in 3D human model recovery from monocular images. This work is centered on developing efficient deep learning models for digitizing human subjects, thereby laying a solid foundation for various subsequent applications. 3D human mesh estimation from monocular images often requires complex deep learning models. In addressing this, we propose a hybrid approach combining deep learning models with analytical inverse kinematics to precisely estimate 3D pose and shape.
Our precise 3D pose estimations facilitate three high-impact downstream applications. Firstly, we aim to create a real-time biomechanics analysis system that provides low-cost, real-time, and accurate estimations of kinematic sequences for managing joint human health-performance. Herein, our system integrates mobile modular 3D pose estimation with model-based inverse kinematics optimization seamlessly. The next downstream task entails skeleton-based human action recognition (HAR), with extensive applications in smart homes, cities, and retail. By rendering 3D pose sequences as RGB images and utilizing conventional CNN architectures alongside various data augmentation schemes, we have achieved results comparable to sophisticated Graph Neural Network models. Lastly, in scenarios where visual cues are scarce yet human monitoring is essential, radar-based sensing offers a non-intrusive solution for tracking human movements and vital signs. Given the paucity of extensive radar datasets, we introduce a "virtual radar" framework in our third downstream task. This framework, driven by 3D pose and physics-informed principles, generates synthetic radar data, presenting a novel avenue towards a nuanced understanding of human behavior through privacy-preserving radar-based methodologies.