Arijit Ray's Webpage

Arijit Ray.|

Computer Science Ph.D. Student |

Projects CV Tutorials Github

Google Scholar

As AI expands the scope of human creativity, I wish to enable the next generation of human-machine collaborative tasks.

About me and my research:[1]

I am a Computer Vision Ph.D. Student at Boston University (BU), advised by Kate Saenko and Bryan Plummer. I collaborate closely with Ranjay Krishna from the University of Washington. I received my M.S from Virginia Tech, advised by Devi Parikh.

Recently, I am interested in adapting multimodal language models to reason compositionally about the visual and physical world - this means being able to imagine scenarios in 3d environments and take actions in them. I am also a part of the BU + UC Berkeley Team in the DARPA Semantic Forensics program to fight misinformation. I enjoy creating ecosystems that encourage creative building. Hence, I am a member of the AI for Impact Venture Studio at MIT.

If you are interested to collaborate or just chat about research on multimodal models to help human creativity, say hi!

AI Resident, Mineral, Google X, 2023

Research Scientist Intern, Facebook AI Research, 2022

Computer Scientist, DARPA XAI, 2017-2021

Deep Learning Intern, 2016, acquired by John Deere

Highlights:

2023: Our paper on evaluating and adapting models for compositional reasoning got accepted to NeurIPS.

2023: One of my student's paper got accepted as an oral at an ICCV Workshop 2023.

2023: I am excited to be spending my summer as an AI Resident at Google X (Mineral), working on adapting multimodal language models for object localization.

2022: We started [AI+X] of BU and Harvard, a group where we brainstorm research/venture ideas on how AI can impact concurrent research areas.

2022: I am excited to be spending my summer as a Research Scientist Intern at Meta (Facebook) AI (FAIR), working on the compositionality of large vision-language models.

2019: I won runners-up at the SRI CVT Shark Tank Competition that supported my mini-project on understanding image-text content to reduce radicalization of opinions on social media.

2017: The weed vs plant detection system I helped develop for precision fertilizing played a key part in the acquisition of Blue River Technologies by John Deere for 305 million USD.

2016: My first paper got accepted to EMNLP.

2014: Our UAV for helping locating natural disaster victims was featured in National News : Deccan Chronicle, Indian Express

2013: I won a silver medal at SRM University Research Day for my white-paper presentation on an Electro-Mechanical Exoskeleton.

2012: I won an Academic Merit Scholarship from SRM University that waives a part of my undergraduate tuition.

Projects/Publications:

Publications:

2024

Jimuyang Zhang, Zanming Huang, Arijit Ray, Eshed-Ohn Bar, FED: Feedback-Guided Autonomous Driving, CVPR 2024, [arxiv, code coming soon]

2023

Dina Bashkirova, Arijit Ray, Rupayan Mallick, Sarah Adel Bargal, Jianming Zhang, Ranjay Krishna, Kate Saenko, Lasagna: Layered Score Distillation for Disentangled Object Relighting, [arxiv] [project page, data]

Arijit Ray, Filip Radenovic, Abhimanyu Dubey, Bryan Plummer, Ranjay Krishna, Kate Saenko, Cola: A Benchmark for Compositional Text-to-image Retrieval, NeurIPS 2023, [arxiv] [project page, data]

Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko, Socratis: Are Large Multimodal Models Emotionally Aware?, ICCV Workshops 2023 (oral), Workshop on Emotionally and Culturally Intelligent AI, [arxiv] [project page, data]

2022

Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko, Language-Guided Audio-Visual Source Separation via Trimodal Consistency, CVPR 2023, [arxiv] [code]

2021

Ajay Divakaran, Karan Sikka, Arijit Ray, Xiao Lin, Yi Yao, User-targeted content generation using multimodal embeddings, US Patent App. 17/191,698 [webpage]

Kamran Alipour, Arijit Ray, Xiao Lin, Michael Cogswell, Jurgen Schulze, Yi Yao, Giedrius Burachas, Improving Users' Mental Model with Attention-directed Counterfactual Edits, 2021 Applied AI Letters (Wiley) [pdf]

Arijit Ray, Michael Cogswell, Xiao Lin, Kamran Alipour, Ajay Divakaran, Yi Yao, Giedrius Burachas, Knowing What VQA Does Not: Pointing to Error-Inducing Regions to Improve Explanation Helpfulness, 2021 Applied AI Letters (Wiley), [pdf] [arXiv] [Project Page]

2019

Arijit Ray, Karan Sikka, Ajay Divakaran, Stefan Lee, Giedrius Burachas, Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation , (EMNLP 2019), also at CVPR-W 2019 VQA and Visual Dialog Workshop, [arXiv], [bibTex] [Data]

Arijit Ray, Yi Yao, Rakesh Kumar, Ajay Divakaran, Giedrius Burachas, Can You Explain That: Lucid Explanations Help Human-AI Collaboratve Image Retrieval , (AAAI-HCOMP 2019), [arXiv], [bibTex] [press coverage]

2016

Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh, "Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions.", (EMNLP 2016). [pdf] [code] [Video]

Course/Mini Projects:

Prashant Chandrasekar, Xuan Zhang, Saurabh Chakravarty, Arijit Ray, John Krulick, and Alla Rozovskaya, "The Virginia Tech System at CoNLL-2016 Shared Task on Shallow Discourse Parsing", CoNLL Shared Task (2016).

The Art of Deep Connection - Towards Natural and Pragmatic Conversational Agent Interactions. [Master's Thesis], Virginia Tech E-Library, 2017

Make RBF Networks Fast Again- Exploiting Multi-Threaded Computing to Speed Up RBF Networks, Multiprocessor Programming Class Project, Fall 2016, [draft paper] [code]

Object Prediction using Image Context: Predict next object in an image reasoned on present image context in a sequential manner, Computer Vision Class Project Fall 2015

Online Demo for Predicting Plausibility of Common Sense Assertions: Enter a three-phrase tuple to assess the plausibility score based on a joint language-vision common-sense reasoning, Class Project, Fall 2015

Learning to Listen: Matching Cover songs with Original Productions: Match Original Songs to Cover Songs using an Ensemble of Supervised and Unsupervised Approaches, Machine Learning Class Project, Fall 2015.

Ray, Arijit, Kishan Prudhvi Guddanti, and N. Chellammal. "An Approach to Intelligent Traction Control Using Regression Networks and Anomaly Detection.", Junior (3rd Year) Semester Project, Fall 2013, published in Springer Applied Artificial Intelligence 29.6 (2015): 597-616.

Tutorials

CARLA Tutorial [Github Code]

Tutorial code on how to run CARLA without a display on an Ubuntu server and get image frames/sensor data

Press Coverage

2019: TechXplore, Phys.org: An image-guessing game to evaluate the helpfulness of machine explanations
2014: Deccan Chronicle, Indian Express, Engineering.Careers360: UAV with Facial Recognition Capabilities for helping locating natural disaster victims, Click here
2014: Dr. Erik Brynjolfsson, the Director of the MIT Center for Digital Business, tweeted about our drone for recognizing disaster victims:

Drones will soon be able to spot you when you walk around outside: UAV With Facial Recognition Takes Flight http://t.co/qpb2owdzGd
— Erik Brynjolfsson (@erikbryn) September 21, 2014

Collaborators

Some of the amazing people I have been fortunate to work with:

Prof. Dhruv Batra (at Virgina Tech), Prof. Stefan Lee (at Virgina Tech and SRI Intl.), Dr. Dhruv Mahajan (at FAIR), Dr. Filip Radenovic (at FAIR), Dr. Abhimanyu Dubey (at FAIR), Dr. Ajay Divakaran (at SRI Intl), Dr. Yi Yao (at SRI Intl), Dr. Giedrius Burachas (at SRI Intl), Dr. Kezhen Chen (at Google X, Mineral)

Hobbies

When I am not training LLMs, I love going to techno (a subgenre of electronic music) fests, making latte art, and engineering simple gadgets. In middle school, I opened an informal research society to encourage fellow students to take an interest in science by constructing simple gadgets. We won multiple accolades in school and city-level exhibitions.

Miscellanea

Paper writing tips and tricks: Writing a good paper (by Jitendra Mailk), shortening papers (by Devi Parikh), Writing Introductions (by Kate Saenko)
Paul Graham's essays: Some of my favorites: Crazy New Ideas, The Bus Ticket Theory of Genius, How to Think for Yourself, Cities and Ambition
On social media: Is it bad or beneficial to society, and why running a social media is hard: Twitter thread by Yishan (former CEO of Reddit).
My favorite quotes
[1] "me and my research" pun inspired by Dhruv Batra