Sindhu B. Hegde

PhD Student, University of Oxford

Hi! I am a fourth year PhD student in the Visual Geometry Group (VGG) at the University of Oxford, supervised by Prof. Andrew Zisserman. My research is in Computer Vision, particularly in understanding non-verbal communication (including co-speech gestures and lip-reading), video understanding, and self-supervised learning. I also work as an AI Scientist at Rode Microphones, focusing on multimodal LLM-based research.

Prior to joining Oxford, I worked as a Lead Data Scientist @ Verisk Analytics. Before that, I pursued Masters’ by Research (MS) at Centre for Visual Information Technology (CVIT), IIIT Hyderabad supervised by Prof. C V Jawahar (IIIT-H) and Prof. Vinay Namboodiri (University of Bath, UK). My Masters’ research focused on exploiting the redundancies in vision and speech modalities for cross-modal generation. Earlier, I completed my undergraduate studies at KLE Technological University, advised by Prof. Shankar Gangisetty and Prof. Uma Mudenagudi.

Research interests: Computer Vision, Machine Learning, Deep Learning, Video Understanding, Multi-modal Learning: Vision + Speech/Language

News [Archive]

Nov 2025 I am honoured to have been awarded the 2025 Google PhD Fellowship in Machine Perception.
Jul 2025 JEGAL has been accepted to ICCV 2025 (ORAL). See you in Hawaii 🏝️⛱️ 🌊
Apr 2025 Our paper on Understanding Co-speech Gestures in-the-wild is up on arXiv.
Links: Project page, Dataset
Jan 2025 Our paper on Scaling Multilingual Visual Speech Recognition accepted to ICASSP 2025 (ORAL).
Links: Project page, Dataset
Sep 2023 Our paper on GestSync: Determining who is speaking without a talking head accepted to BMVC 2023 (ORAL).
Links: Project page, Demo
Jul 2023 Participated in the International Computer Vision Summer School (ICVSS) at Sicily, Italy. Had an incredible experience of learning from some of the most distinguished computer vision experts!

Talks

Oct 2025 Invited talk on “Understanding Co-speech Gestures in Videos” at the Berkeley AI Research Lab (BAIR) , University of California, Berkeley. Hosted by Prof. Alyosha Efros.

Recent papers [Full list]

  1. ICCV
    Understanding Co-speech Gestures in-the-wild
    Hegde, Sindhu, Prajwal, KR, Kwon, Taein, and Zisserman, Andrew
    International Conference on Computer Vision (ICCV) 2025
  2. ICASSP
    Scaling Multilingual Visual Speech Recognition
    Prajwal, KR,  Hegde, Sindhu, and Zisserman, Andrew
    In International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
  3. BMVC
    GestSync: Determining who is speaking without a talking head
    Hegde, Sindhu, and Zisserman, Andrew
    In British Machine Vision Conference (BMVC) 2023