Projects

A list of exciting projects that I have been a part of! Will be updated as and when possible.

publications


Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors
Sindhu B Hegde*, Rudrabha Mukhopadhyay*, Vinay P Namboodiri, C. V. Jawahar
ACM-MM, 2022
arXiv / code / project page

Super-resolve extremely low-resolution videos (eg., 8x8 pixels) to obtain realistic, high-resolution outputs (256x256 pixels). Upsampled the videos at a scale-factor of 32x for the first time!.

Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild
Sindhu B Hegde*, K R Prajwal*, Rudrabha Mukhopadhyay*, Vinay P Namboodiri, C. V. Jawahar
ACM-MM, 2022
arXiv / code / project page

Generates speech for silent talking face videos for any speaker in-the-wild! One of the first models to work for arbitrary speakers, with no explicit costraints in the domain or vocabulary.

Audio-Visual Speech Super-Resolution
Rudrabha Mukhopadhyay*, Sindhu B Hegde*, Vinay P Namboodiri, C. V. Jawahar
BMVC, 2021 (Oral Presentation)
pdf / presentation / project page

An audio-visual model to super-reolve very low-resolution speech signals (e.g., 1kHz) & generate high-quality speech (16kHz). Works even if the real-visual stream is unavailable/corrupted using the proposed pseudo-visual approach!

Visual Speech Enhancement Without A Real Visual Stream
Sindhu B Hegde*, K R Prajwal*, Rudrabha Mukhopadhyay*, Vinay P Namboodiri, C. V. Jawahar
WACV, 2021
arXiv / demo video / presentation / code / project page

A new paradigm for speech enhancement that works effectively in unconstrained, high-noise, real-world environments. A hybrid approach to hallucinate the visual stream using only the noisy speech as input.

Towards Automatic Speech to Sign Language Generation
Parul Kapoor, Rudrabha Mukhopadhyay, Sindhu B Hegde, Vinay P Namboodiri, C. V. Jawahar
INTERSPEECH, 2021
arXiv / demo video / code / project page

Generated continuous sign-language videos solely from speech segments for the first time. Also curated and released the first Indian Sign Language (ISL) dataset comprising speech annotations, transcripts & sign-language videos.

The Interplay of Speech and Lip Movements
Rudrabha Mukhopadhyay*, K R Prajwal*, Sindhu B Hegde*, Vinay P Namboodiri, C. V. Jawahar
ICPR Demonstrations, 2020
demo video / writeup

Extensively explore the correlation between vision and speech modalities, specifically the speech and lip movements.

implementations


Audio-Visual Speaker Separation
code / colab / presentation

Separate the two speakers talking simultaneously in a cocktail-party like situation. An audio-visual model to enhance & isolate the speech of the target speaker.

Papers referred:
(1) The Conversation: Deep Audio-Visual Speech Enhancement
(2) Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Speech-driven Lip Synthesis
code

Generate a talking-face video from the still image of the target identity & the corresponding speech segment. Works for unseen faces & audios!

Papers referred:
(1) You said that?
(2) Towards Automatic Face-to-Face Translation