SHIFT utilizes the Mean-Teacher framework to update the teacher model Mt with an Exponential Moving Average (EMA) of the student model Ms's weights to adapt the model pre-trained on a labeled adult source dataset (xs, ys) to unlabeled infant target images (xt) (Section 3.2). To address anatomical variations in infants, SHIFT employs an infant pose prior \(\theta_p\) which assigns plausibility scores for each prediction of the student model Ms (Section 3.3). Further, to handle the large self-occlusions in the target domain, we employ an off-the-model Fseg to give pseudo segmentation masks pt with which our Kp2Seg module \(G(\cdot)\) learns to perform pose-image visibility alignment (Section 3.4) hence effectively leveraging the context present in the visible portions of each image. All the learnable components of the framework are denoted in red and rest in black.
Qualitative Results: Qualitative results on SURREAL → SyRIP (top 3 rows) and SURREAL → MINI-RGBD (bottom 2 rows). From left to right:
source only keypoints, keypoint predictions by UniFrame, predictions by FiDIP, predictions by SHIFT, and ground truth keypoints. As it
can be seen above, the infant prior is essential to predict plausible poses in cases where other methods fail (top row). Further, our method
can utilize context from visible regions to predict keypoints in self-occluded areas (2nd and 3rd row) while seamlessly adapting to different
scenarios (4th and 5th row). ○ denotes the self-occluded regions in the images.
Pose Estimation under Self-Occlusions: SURREAL → SyRIP. UniFrame prediction (left panel) fails to correctly estimate significant portions of the lower back and left hand of the infant while
SHIFT is able to reasonably do so. Ground truth (rightmost panel) and extracted mask (second from left panel) are also shown.
Within the domain adaptation setting for infant pose estimation, SHIFT method outperforms existing approaches. As shown below, SHIFT achieves superior performance compared to both unsupervised domain adaptation (UDA) methods and fully supervised infant pose estimation techniques. The best numbers are highlighted in bold, second best are underlined.
Table 1: Comparison with UDA methods on adult → infant adaptation: SURREAL → MINI-RGBD (left) and SURREAL → SyRIP (right).
Table 2: Comparison with UDA methods on infant → infant adaptation: SyRIP → MINI-RGBD.
@InProceedings{Bose_2025_CVPR,
author = {Bose, Sarosij and Cruz, Hannah Dela and Dutta, Arindam and Kokkoni, Elena and Karydis, Konstantinos and Chowdhury, Amit Kumar Roy},
title = {Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
month = {June},
year = {2025},
pages = {5562-5571}
}
}
Copyright: CC BY-NC-SA 4.0 © Sarosij Bose | Last updated: 15th July 2025 | Website credits to Nerfies