Aayan Yadav

BTech Student
IIT Roorkee

About Me

I am a senior at the Mehta Family School of Data Science and Artificial Intelligence at IIT Roorkee. My research interests lie in 3D Computer Vision - 3D Representations, Reconstruction & Scene Understanding. My aim is to develop data and compute efficient models that understand the physical world.

Most recently, I interned at AuraML where I worked on text to 3D scene generation. In the past, I had the privilege to work with Prof. Justin Johnson and Dr. Karan Desai on the Benchmarking Object Detectors with COCO: A New Path Forward where we refined annotations of MS COCO dataset. I am working with Prof. Sanjeev Kumar on 3D Face Reconstruction.

I am actively looking for opportunities in the field of 3D computer vision. I am interested in full time research roles and PhD starting Fall 2026. I am open to collaboration. If your work aligns with my interests please reach out!

▸ News

[April 2025]: Joining AuraML as Research Intern!
[July 2024]: COCO-ReM is accepted to ECCV 2024!
[December 2023]: Reached finals of Smart India Hackathon 2023!
[October 2022]: Joining IIT Roorkee as a bachelors student.

Publications

Benchmarking Object Detectors with COCO: A New Path Forward

Shweta Singh^*, Aayan Yadav^*, Jitesh Jain, Humphrey Shi, Justin Johnson, Karan Desai

ECCV 2024 paper | bibtex | code | website

StegaVision: Enhancing Steganography with Attention Mechanism (Student Abstract)

Abhinav Kumar, Pratham Singla, Aayan Yadav

AAAI 2025 paper | bibtex | code

Provenance Detection for AI-Generated Images: Combining Perceptual Hashing, Homomorphic Encryption, and AI Detection Models

Shree Singhi, Aayan Yadav, Aayush Gupta, Shariar Ebrahimi, Parisa Hassanizadeh

Under Review paper | bibtex | code

Projects

SLAFCoM: A Study on Loss Functions for Adversarial Finetuning of Contrastive Models

Introduced a Clean Consistency Term in the loss function and experimented with different weights and learning rate to improve adversarial finetuning of contrastive models.

GitHub

Sirius

An agentic RAG system using SoTA techniques like AdaRAG, PlanRAG, HyDE, SPLADE, MetRAG, RRF etc.

GitHub

MedMatcher

Similar Document Template Matching for Medical Dataset. Fine-tuned LayoutLMv3 model on custom medical document dataset using weighted cross entropy loss and minibatch gradient descent.

GitHub

Image Captioning Model

Build an image captioning model using transfer learning techniques on the Flickr8k dataset. We fine-tuned a combination of pretrained Inceptionv3 and LSTM with regularization.

GitHub

Blogs

Dismantling Disentanglement in VAEs

In this blog post I give a brief introduction of variational autoencoders and then explain how we can achieve disentanglement in latent space. It is an explanation of this paper.

Activation Functions

This is a beginner's introduction to activation functions. This was my first ever blog which I wrote for Blogathon organised by DSG IITR!