Self-Supervised Scene Representation Learning

In This Section

Introduction

Click the "Agenda" tab to access the video recording.

Given only a single picture, people are capable of inferring a mental representation that encodes rich information about the underlying 3D scene. We acquire this skill not through massive labeled datasets of 3D scenes, but through self-supervised observation and interaction. Building machines that can infer similarly rich neural scene representations is critical if they are to one day parallel people’s ability to understand, navigate, and interact with their surroundings. This poses a unique set of challenges that sets neural scene representations apart from conventional representations of 3D scenes: Rendering and processing operations need to be differentiable, and the type of information they encode is unknown a priori, requiring them to be extraordinarily flexible. At the same time, training them without ground-truth 3D supervision is a highly underdetermined problem, highlighting the need for structure and inductive biases without which models converge to spurious explanations.

Focusing on 3D structure, a fundamental feature of natural scenes, Vincent will demonstrate how we can equip neural networks with inductive biases that enables them to learn 3D geometry, appearance, and even semantic information, self-supervised only from posed images. He will show how this approach unlocks the learning of priors, enabling 3D reconstruction from only a single posed 2D image, and how we may extend these representations to other modalities such as sound.

Vincent will then discuss how these efforts advance us towards a unified scene representation learning backbone to applications across computer vision, computer graphics, robotics, and other applications of computer science, and what key challenges remain.

About Friday Friday Lunches
First Friday Lunches are informal gatherings open to CSAIL Alliances members and students who would like to attend a discussion on a current project. Most virtual lunches will feature a Faculty Researcher but a few will feature a PhD student, post-doc or Research Scientist.

Agenda

November 5

12:00 PM ET

Self-supervised Scene Representation Learning

Vincent Sitzmann

Postdoctoral Associate

Speakers

Vincent Sitzmann

Postdoctoral Associate