Sarosij Bose

email: ojnsmemsearb.sosoce@ii.sunscramble

I am a Ph.D. student at University of California, Riverside advised by Dr. Amit K. Roy Chowdhury and Dr. Konstantinos Karydis. My research revolves around how autonomous systems perceive this world and building pipelines that can enable them to understand scenes using minimal supervision just as humans do.

I received my Bachelor in Technology (BTech.) degree in Computer Science and Engineering from University of Calcutta, India. I am fortunate to be advised by some amazing researchers over the years. I interned at Siemens with the Future of Automation (FoA) division on designing sensor realistic LiDAR simulation systems. I am a recipient of the MITACS Globalink Fellowship and worked with Prof. Jiju Poovvancheri at GSC Lab, Saint Mary's University on skeleton extraction for motion capture. Earlier, I spent consecutive semester breaks at Indian Institute of Science working with Prof. Anirban Chakraborty at VCL Lab and Prof. Kunal Narayan Chaudhury at LISA Lab on person-reidentification and lipschitz regularization respectively. I have also worked extensively with Prof. Amlan Chakrabarti on action recognition from videos.

Email / Scholar / Twitter / Github / LinkedIn

Research Interests

I am currently focused on enhancing the perception capabilities of robotic systems by developing robust and generalizable world representations, with a particular interest in leveraging foundation models for comprehensive scene understanding. I have also worked on pose estimation under occlusions. My previous work encompassed various sub-domains, including video understanding, person-reid and shape approximation methods with an emphasis on utilizing unlabeled data for real-world applications.

I am always open to new collaborations and research ideas. Feel free to reach out if you are interested in working together!

News

NEW!! Jun '25 : 2 papers accepted to ICCV 2025! See you in sunny Hawaii 🏝️
NEW!! Jun '25 : I will be interning with NEC Labs over the summer!
NEW!! Apr '25 : Received Travel Grant to attend CVPR 2025!
Apr '25 : Serving as a reviewer for IJCV
NEW!! Mar '25 : SHIFT accepted to CVPR ABAW 2025!
Feb '25 : Serving as a reviewer for IROS 2025
NEW!! Feb '25 : Paper accepted to CVPR 2025!
Nov '24 : Serving as a reviewer for ICRA 2025
Oct '24 : Serving as a reviewer for ICLR 2025
Dec '23 : Started the CRIS Colloquium, check it out here
Oct '23 : Serving as a reviewer for ICASSP 2023
Aug '23 : I will be joining UC Riverside for my PhD!
Jun '23 : SoccerKDNet accepted to Springer PReMI 2023!
Mar '23 : Joined Siemens as a research engineer intern!
Jan '23 : Serving as a reviewer for MLRC'23

Publications/Pre-Prints

2025

Uncertainty Aware Diffusion Guided Refinement of 3D Scenes (NEW!)

Sarosij Bose, Arindam Dutta, Sayak Nag, Junge Zhang, Jiachen Li, Konstantinos Karydis, Amit K. Roy Chowdhury
IEEE/CVF International Conference on Computer Vision (ICCV), 2025

Abstract / arXiv / BibTeX

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under Occlusions (NEW!)

Yash Garg, Saketh Bachu, Arindam Dutta, Rohit Lal, Sarosij Bose, Calvin-Khang Ta, M. Salman Asif, Amit K. Roy Chowdhury
IEEE/CVF International Conference on Computer Vision (ICCV), 2025

Abstract / paper / BibTeX

Leveraging Synthetic Adult Datasets for Infant Pose Estimation

Sarosij Bose, Hannah Dela Cruz, Arindam Dutta, Elena Kokkoni, Konstantinos Karydis, Amit K. Roy Chowdhury
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)-ABAW, 2025

Abstract / Website / Paper / BibTeX

Conformal Prediction and MLLM-Aided Uncertainty Quantification in Scene Graph Generation

Sayak Nag, Udita Ghosh, Calvin-Khang Ta, Sarosij Bose, Jiachen Li, Amit K. Roy Chowdhury
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Abstract / paper / BibTeX

Unsupervised Domain Adaptation for Occlusion Resilient Human Pose Estimation

Arindam Dutta, Sarosij Bose, Saketh Bachu, Calvin Khang-Ta, Konstantinos Karydis, Amit K. Roy Chowdhury
arxiv pre-print, 2025

Abstract / arXiv / BibTeX

2023

SoccerKDNet: A Knowledge Distillation Framework for Action Recognition in Soccer Videos

Sarosij Bose, Saikat Sarkar, Amlan Chakrabarti
10th Springer International Conference on Pattern Recognition and Machine Intelligence (PReMI), 2023

Abstract / arXiv / slides / code / Dataset / BibTeX

Realtime motion capture for VR Applications

Sarosij Bose, Jiju Poovvancheri
MITACS Globalink Technical Report

Abstract / report / slides / code

Drone Assisted Forest Structural Classification of Kejimkujik National Park using Deep Learning

Sutirtha Roy, Sarosij Bose, Karen Harper, Vaibhav Jaiswal, Manu Bansal
3rd International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), 2022

Abstract / paper / slides / code

Lipschitz Bound Analysis of Neural Networks

Sarosij Bose
13th IEEE International Conference on Computing Communication and Networking Technologies (ICCCNT), 2022

Abstract / paper / slides / code

2021

A Fusion Architecture model for Human Activity Recognition

Sarosij Bose, Amlan Chakrabarti
18th IEEE India Council International Conference (INDICON), 2021

Abstract / paper / slides / code

ResCNN: An alternative implementation of Convolutional Neural Networks

Sarosij Bose*, Avirup Dey*
8th IEEE Uttar Pradesh International Conference (UPCON), 2021

Abstract / paper / slides / code

Projects

TSCLite: A powerful and lightweight Traffic Sign Classification model Implementation
Sarosij Bose*, Avirup Dey*

Won 1st position in AI Entrepre-Neural, 2021 by GES, IIT Kharagpur and Intel

This work focuses on two lightweight Traffic sign classification implementations which can predict Traffic signs from any real time video feed. Here, a model based on an slightly enhanced LeNet architecture has been used and trained on the German Traffic Sign Dataset (GTSD) which has over 70000 images of traffic signs and over 40 various classes. Our model achieves a validation accuracy of over 98% and a training accuracy of over 97%. This saved model is then optimized over the Intel OpenVINO Model Optimizer + Inference Engine and run directly for predicting Traffic signs live from any video source(we have used webcam for our run). We have also provided a non optimized solution for comparison purposes.

*Equal Contribution

RobustFreqCNN
Sarosij Bose

PyTorch implementation of this paper

This project is the unofficial implementation of the paper "Towards Frequency-Based Explanation for Robust CNN". It primarly deals with the extent to which image features are robust in the frequency domain. Here, the DCT Transform, the pre-trained ResNet 18 model and the RCT maps are generated from the adversarial as well as the normal images.

Misc

Served as a reviewer for ICLR, IROS, ICRA, WACV, ICASSP and MLRC

My talks at the KyushuTech-CU joint symposium on Activity Recognition and 3D Convolution here and here

On popular request, I have put up the MITACS application process on a blog

My djikstra number is 4