Sarosij Bose

emailojnsmemsearb.sosoce@ii.sunscramble

I am a Ph.D. student at University of California, Riverside co-advised by Professors Amit K. Roy Chowdhury and Konstantinos Karydis. My research revolves around how autonomous systems perceive this world and building pipelines that can enable them to understand scenes using minimal supervision just as humans do.

I received my Bachelor in Technology (BTech.) degree in Computer Science and Engineering from University of Calcutta, India. I am fortunate to be advised by some amazing researchers over the years. I interned at Siemens with the Future of Automation (FoA) division on designing sensor realistic LiDAR simulation systems. I am a recipient of the MITACS Globalink Fellowship and worked with Prof. Jiju Poovvancheri at GSC Lab, Saint Mary's University on skeleton extraction for motion capture. Earlier, I spent consecutive semester breaks at Indian Institute of Science working with Prof. Anirban Chakraborty at VCL Lab and Prof. Kunal Narayan Chaudhury at LISA Lab on person-reidentification and lipschitz regularization respectively. I have also worked extensively with Prof. Amlan Chakrabarti on action recognition from videos.

Email /  Scholar  /  Twitter  /  Github  /  LinkedIn

profile photo

Research Interests

I am currently focused on enhancing the perception capabilities of robotic systems by developing robust and generalizable world representations, with a particular interest in leveraging foundation models for comprehensive scene understanding. I have also worked on pose estimation under occlusions. My previous work encompassed various sub-domains, including video understanding, person-reid and shape approximation methods with an emphasis on utilizing unlabeled data for real-world applications.


I am always open to new collaborations and research ideas. Feel free to reach out if you are interested in working together!

News

Oct '24 : Serving as a reviewer for ICLR 2025
Dec '23 : Started the CRIS Colloquium, check it out here
Oct '23 : Serving as a reviewer for ICASSP 2023
Aug '23 : I will be joining UC Riverside for my PhD!
Jun '23 : SoccerKDNet accepted to Springer PReMI 2023!
Mar '23 : Joined Siemens as a research engineer intern!
Jan '23 : Serving as a reviewer for MLRC'23
Oct '22 : LBANN accepted as a full paper to IEEE ICCCNT 2022!
Oct '22 : Drone Assisted Classification paper accepted at IEEE ICCCIS 2022!
Sep '22 : Check out the MITACS Report and summer seminar slides
Jul '22 : Accepted to the 6th CVIT Summer School by IIIT Hyderabad!
May '22 : Accepted to the Robotics and AI Summer School 2022 organized by IRI, CSIC-UPC!
Dec '21 : Awarded the MITACS GRI Fellowship! I will be visiting Saint Mary's University, Halifax for the summer

Research
2023
SoccerKDNet: A Knowledge Distillation Framework for Action Recognition in Soccer Videos
Sarosij Bose, Saikat Sarkar, Amlan Chakrabarti
10th Springer International Conference on Pattern Recognition and Machine Intelligence (PReMI), 2023
arxiv / slides / code / dataset

Classifying player actions from soccer videos is a challenging problem, which has become increasingly important in sports analytics over the years. Most state-of-the-art methods employ highly complex of- fline networks, which makes it difficult to deploy such models in resource constrained scenarios. Here, in this paper we propose a novel end-to-end knowledge distillation based transfer learning network pre-trained on the Kinetics400 dataset and then perform extensive analysis on the learned framework by introducing a unique loss parameterization. We also introduce a new dataset named "SoccerDB1" containing 448 videos and consisting of 4 diverse classes each of players playing soccer. Furthermore, we introduce an unique loss parameter that help us linearly weigh the extent to which the predictions of each network are utilized. Finally, we also perform a thorough performance study using various changed hyperparameters. We also benchmark the first classification results on the new SoccerDB1 dataset obtaining 67.20% validation accuracy. The dataset has been made publicly available at: https://bit.ly/soccerdb1

2022
Realtime motion capture for VR Applications
Sarosij Bose, Jiju Poovvancheri
MITACS Globalink Technical Report
paper / slides / code

We present a novel shape approximation method using a pill decomposition approach given the surface points and their corresponding normals at each point on the surface. We first extract the maximal empty sphere representation of a given input shape and then construct the `pill`: consisting of two sphere meshes. These collection of pills are progressively decomposed to obtain a good approximation of the original shape. Our algorithm is easy to reuse and implement and is currently available in a multi-processing setup. To ensure reproducibility and further research, the source code and raw data has also been released.

Drone Assisted Forest Structural Classification of Kejimkujik National Park using Deep Learning
Sutirtha Roy, Sarosij Bose, Karen Harper, Vaibhav Jaiswal, Manu Bansal
3rd International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), 2022
paper / slides / code

The wide array of terrestrial forest and wooded lands are one of the richest sources because of their inherent structural diversity. For the diversity indicator of a forest, structure plays a significant role in it. We propose a transfer learning framework consisting of the ResNet-50 architecture using which we obtained a test accuracy of 75.86%. In this paper the analysis of the structural diversity of the Kejimkujik National Park is done with the help of a drone and using deep learning methods which predict the structural class of the forest. We have used a novel forest structural diversity dataset which was collected using DJI Mavic drone to train the deep learning model.

Lipschitz Bound Analysis of Neural Networks
Sarosij Bose
13th IEEE International Conference on Computing Communication and Networking Technologies (ICCCNT), 2022
paper / slides / code

Lipschitz Bound Estimation is an effective method of regularizing deep neural networks to make them robust against adversarial attacks. This is useful in a variety of applications ranging from reinforcement learning to autonomous systems. In this paper, we highlight the significant gap in obtaining a non-trivial Lipschitz bound certificate for Convolutional Neural Networks (CNNs) and empirically support it with extensive graphical analysis. We also show that unrolling Convolutional layers or Toeplitz matrices can be employed to convert Convolutional Neural Networks (CNNs) to a Fully Connected Network. Further, we propose a simple algorithm to show the existing 20x-50x gap in a particular data distribution between the actual lipschitz constant and the obtained tight bound. We also ran sets of thorough experiments on various network architectures and benchmark them on datasets like MNIST and CIFAR-10. All these proposals are supported by extensive testing, graphs, histograms and comparative analysis.

2021
A Fusion Architecture model for Human Activity Recognition
Sarosij Bose, Amlan Chakrabarti
18th IEEE India Council International Conference (INDICON), 2021
paper / slides / code

Human Activity Recognition (HAR) is a domain of increasing interest with several two-stream architectures being suggested in recent years. However, such models have a huge number of parameters and storage needs due to the presence of a dedicated temporal stream. In this paper, we propose an architecture comprising of the weighted late fusion between the Softmax scores of the spatiotemporal stream (I3D) and another 2D convolutional neural network stream (Xception). We show that our model produces competitive performance w.r.t to other existing spatial and two-stream architectures along with reducing the number of parameters significantly and minimizing storage costs.

ResCNN: An alternative implementation of Convolutional Neural Networks
Sarosij Bose*, Avirup Dey*
8th IEEE Uttar Pradesh International Conference (UPCON), 2021
paper / slides / code

Convolutional Neural Networks (CNN) have been used for long for feature extraction from images in deep learning. Here we introduce ResilientCNN or ResCNN for short where we show that when convolution is implemented as an matrix-matrix operation coupled with some image processing techniques like Singular Value Decomposition (SVD) can be used as an better alternative to traditional convolution. We show that our ResCNN learns with bigger batch sizes and at much higher learning rates (7x) without compromising on accuracy compared to traditional convolutional networks by implementing both models on the MNIST Dataset.

*Equal Contribution
Select Projects
prl

TSCLite: A powerful and lightweight Traffic Sign Classification model Implementation
Sarosij Bose*, Avirup Dey*

Won 1st position in AI Entrepre-Neural, 2021 by GES, IIT Kharagpur and Intel

This work focuses on two lightweight Traffic sign classification implementations which can predict Traffic signs from any real time video feed. Here, a model based on an slightly enhanced LeNet architecture has been used and trained on the German Traffic Sign Dataset (GTSD) which has over 70000 images of traffic signs and over 40 various classes. Our model achieves a validation accuracy of over 98% and a training accuracy of over 97%. This saved model is then optimized over the Intel OpenVINO Model Optimizer + Inference Engine and run directly for predicting Traffic signs live from any video source(we have used webcam for our run). We have also provided a non optimized solution for comparison purposes.

*Equal Contribution

prl

RobustFreqCNN
Sarosij Bose

PyTorch implementation of this paper

This project is the unofficial implementation of the paper "Towards Frequency-Based Explanation for Robust CNN". It primarly deals with the extent to which image features are robust in the frequency domain. Here, the DCT Transform, the pre-trained ResNet 18 model and the RCT maps are generated from the adversarial as well as the normal images.

Misc

  • Served as a reviewer for ICLR, WACV, ICASSP and MLRC
  • My talks at the KyushuTech-CU joint symposium on Activity Recognition and 3D Convolution here and here
  • On popular request, I have put up the MITACS application process on a blog
  • My djikstra number is 4


  • © Sarosij Bose (2023) | When in Rome, do as the romans do