WRITTEN BY: Audrey Woods
As self-driving cars become a market reality, image generators like DALL-E take the internet by storm, and medical AI startups offer new ways to interpret clinical images, the topic of visual computing has come into focus for industry professionals. Intelligent vehicles, retail, 3D printing & manufacturing, photography, clinical AI, and robotics are just a few of the many fields affected by this branch of computer science. With the computer vision market predicted to reach $48 billion by 2023, visual computing’s industry footprint is only going to get larger as these technologies improve.
What is visual computing? This field of research—sometimes called vision and graphics—covers the union of computer vision, graphics, and imaging. Supported by major events such as the Computer Vision and Pattern Recognition Conference, it is a thriving area of academic inquiry that supports specific visual computing departments in most major academic institutions. Here at MIT, CSAIL’s robust team of researchers have produced breakthroughs such as the late Professor Seth Teller’s visionary work on city scanning, previous MIT Associate Professor Leonard McMillan’s image-based rendering methods, Professor William Freeman’s participation in the first images of the supermassive black hole at the center of our galaxy, and Professor Frédo Durand, Assistant Professor Jonathan Ragan-Kelley, and Professor Saman Amarasinghe’s Halide compiler, now used by Google, Facebook, Adobe, and more. The Visual Computing Community of Research at MIT has also produced startup companies like PathAI, a medical AI tool for pathology diagnostics, and Hosta a.i., which combines computer vision and architecture for automated property assessments.
Research in visual computing continues to grow and expand, especially at MIT, and the technologies being invented at CSAIL right now are sure to have a major impact on industry in the future, just as they have in the past.
Here’s a sampling of what’s going on at CSAIL and how this research could affect daily life.
Image Recognition, Generation & Manipulation
As one might imagine, the first big hurdle to address in visual computing is the ability of a computer to understand and process image data. We take for granted our ability to look at pictures and immediately understand the perspective, delineations between objects, what those objects are, and their location in 3-D space. But it’s more complicated for machines, which can struggle with unclear barriers between objects, identifying differences between two images, and topology.
To address this, CSAIL researchers have been studying things like the effectiveness of deep features as a perceptual metric and applying deep features—or Convolutional Neural Networks trained in high-level classification tasks such as object detection—for discriminative localization, allowing models to both classify and localize objects in an image. In conjunction with Cornell University, MIT computer scientists developed a vision model called STEGO that can identify and segment the objects in an image entirely without human help. Principal Research Scientist Ruth Rosenholtz is doing innovative research on how humans perceive visual data, which has far-reaching implications for how we design computers to “see.” Professor Antonio Torralba and Senior Research Scientist Aude Oliva created the Places Database, a 10-million-image database to train artificial systems to a human-level accuracy of scene recognition. Similarly, Professor Torralba, Professor Joshua Tenenbaum, and Professor Freeman collaborated on a project to help computers understand 3D objects from 2D images with a 3D Interpreter Network. Such a program has broad potential in any industry that requires planning and design, such as retail, construction, and automobile manufacturing.
Arguably just as important as image recognition is the ability of computers to generate images for various industry needs. Everything from video games to medical technology relies on computers taking an input—whether that’s a scan or user action—and creating some reliable visual output. To this end, Professor John Guttag’s group developed a model to generate atlases, or sample medical images that represent the prototypical anatomy of a given patient population. Assistant Professor Phillip Isola’s group developed pix2pix, a software that can translate one image into another using conditional adversarial networks, adding color, generating an image from a rough sketch, and even creating a full portrait or map from just a few rough lines. Professor Isola’s group also discovered a new way to expand a model’s ability to “riff” on a given image, adding creativity and real-world complexity to how algorithms see the world. Associate Professor Justin Solomon is using point clouds to provide flexible geometric representations of objects. A group of MIT professors created a program that, given a person in one pose, could create a realistic image of that same person in another pose, adding to the arsenal of tools computer engineers can use for virtual reality and artificial simulations (more on that later). On the life sciences side, Computational Biology Professor Bonnie Berger took cues from panoramic photography to create a new way to merge datasets, giving her and other researchers a fresh method to visualize data.
Finally, as our image-capture technology improves there’s an opportunity for computers to not only process and generate but also enhance images in a variety of ways. Professor Solomon co-authored a paper on the vectorization of line drawings, which improves computer-assisted image tracing. Such a program is useful not just to the animation industry, which has been hesitant to adopt such time-saving techniques due to their previous lack of reliability, but also in other areas such as architecture, real estate, creating maps from GPS traces, and more. Professor Frédo Durand has done extensive work on real-time image enhancement, flash enhancement of photos shot in dark environments, and is currently working on computational bounce flash for indoor portraits, all of which broadens what’s possible in photography. Professor Isola was previously involved in creating a system more successful at colorizing black-and-white photographs than previous methods. And CSAIL’s Medical Vision Group, including Professor Polina Golland, created a way to boost the quality of brain scans so quick diagnostic MRIs can be used for large-scale studies.
Modeling, Rendering & Simulations
Extending on the usefulness of image generation is the creation of 3D models, which is critical in many different areas of study and industry. For example, Professor Golland’s lab recently used volumetric parameterization to create models of the placenta, allowing doctors to visualize this critical organ for better fetal health outcomes. Professor Durand has been working on differentiable rendering, including inverse rendering, which is necessary for things like Hollywood special effects, specifically de-aging. In early 2022, several CSAIL scientists published a paper which describes a new technique to speed up the real-time generation of 3D representations using light field networks, advancing the ability of, for instance, a robot to register and interpret the physical world around them in real-time. Additionally, Professor Berger was recently involved in a project using machine learning to create 3D models of proteins structures, which helped researchers study viral spike proteins during the coronavirus pandemic.
One of the industry areas most dependent on accurate and efficient modeling is 3D printing and manufacturing. 3D printing has huge potential in all areas of the market—retail, construction, medicine, etc.—and the hardware is rapidly scaling up to meet this demand. However, this creates a need for software robust enough to handle the increasing complexity. To address this, Professor Wojciech Matusik created OpenFab, a pipeline that allows for faster, more detailed creations with multiple materials. Along those same lines, Professor Matusik published work on how to use machine learning and computer vision to help 3D printing algorithms learn and adjust as they’re running, saving time and minimizing costly errors. CSAIL Assistant Professor Mina Konaković Luković has previously done research on 3D modeling and printing deployable structures such as programmable heart stents that can be printed to curve along with an artery to minimize valve pressure.
Modeling and rendering are critical components of another exciting subset of visual computing: simulated reality. Virtual simulations are important not just for gaming and entertainment but also for manufacturing, education, healthcare, retail, and training both humans and AI. Professor Matusik’s current project creating a differentiable cloth simulator could be applied to computer animation, garment design, and robot-assisted dressing. CSAIL’s recently unveiled VISTA 2.0 software allows for the training of autonomous vehicle programs in a photorealistic simulation, which promises to increase both the efficiency and safety of self-driving cars. It’s important for computer models to have experience with edge cases and unusual scenarios, but those are difficult to find in real life, not to mention repeat for robust training. Therefore, photorealistic simulations are a cheaper and effective solution to many such machine learning problems.
Video Processing
Building on the ability of computers to process and generate static images, CSAIL professors are also hard at work on the complex task of getting machines to process video data. Professor Samuel Madden has one solution to the volume of video data currently available in the form of Vaas: Video Analytics at Scale. Designed for large-scale datasets, Vaas uses an interactive interface to rapidly solve video analytics tasks. In recent work, Professor Madden is further developing machine-learning technology for running data queries over large video archives. Elsewhere in CSAIL, Professor Durand and Professor Freeman have worked together on Eulerian Video Magnification, which reveals subtle changes in the world such as light patterns invisible to the human eye and can help cameras see around corners. This research led to the development of the “ShadowCam” which has been successfully trialed in self-driving cars.
As with images, we not only want our machines to process video but also to generate video information when necessary. For example, Professor Matusik’s lab is working on the creation of 3D holograms which could lead to a future of home 3D entertainment systems without the need for glasses. In 2016, Professor Torralba was involved in creating a system that could “predict” where a video would go, generating new video from a still image of a scene that simulates the future of that scene. Professor Torralba is working on a Predictive Vision project which aims to “develop an algorithm to anticipate visual events that may happen in the future.”
Video is also proving to be a useful way to train AI, as with the VISTA 2.0 program mentioned above. Dr. Oliva, among others, is using video to help AI understand events and actions. She recently joined with fellow CSAIL researchers to create a technique that trains AI with both audio and visual data to label actions more effectively. Dr. Oliva developed new software to use neuroscience to transfer human brain processes to AI and created a large-scale video database for more effective machine learning. Advances such as these could prove critical in creating independent and adaptive artificial intelligence.
Robotics & AI
Speaking of AI, perhaps one of the most media-hyped applications of visual computing is its role in the creation of robots. After all, how can robots do their assigned jobs if they aren’t able to “see” the world around them? Therefore, the hardware behind robot sensors has become an increasingly vibrant area of study. CSAIL researchers recently developed an artificial vision system that can work both on land and underwater, increasing the number of potential environments where robots might be useful. The invention of GelSight in Professor Edward Adelson’s lab uses light technology for accurate touch sensing, overcoming some limitations of traditional tactile technology. Research coming out of Professor Matusik’s lab is combining robotics and 3D modeling technology to optimize the process of designing robots for specific functions.
There are many projects in CSAIL’s Distributed Robotics Group that involve visual computing, including Conduct-a-Bot, which allows a user to control a drone with gestures, a wearable navigation system to assist the blind, and several projects improving the technology behind autonomous vehicles. Previous CSAIL robots, including a hair combing bot and a robotic fish that swims underwater, depend on artificial vision to perceive something, whether that’s modeling tangled fibers of hair or observing and replicating the behavior of underwater wildlife.
As CSAIL’s Director, Professor Daniela Rus observed in a widely publicized op-ed, robotics and autonomy are the future and industries should embrace these innovative technologies not only for their own growth but also for the good of their employees. This bright future of ubiquitous robotic assistance will be possible, in part, because of visual computing.
What’s Next in Visual Computing?
Visual computing is a dynamic area of research that has made great strides in recent years. However, there are still plenty of challenges left to address. The biggest open questions right now are (1) overcoming hardware limitations, (2) adding system flexibility, and (3) creating more efficient ways to gather and label data for algorithm training.
The hardware issues are the most externally obvious, especially in situations like autonomy where cameras and sensors need to be adaptable, precise, and fast. Beyond that, tremendous CPU power is required to support the increasingly complex things we ask computers to do, so technical restraints will become a greater challenge as the demand for malleable programs increases. Luckily, this is something CSAIL researchers are addressing, creating faster and cheaper computing methods and new sensors.
When it comes to flexibility, traditional algorithms are designed for specific tasks and lack the capacity to pivot to different queries without an entirely new training process. This means that, even in subsets of visual computing like medical imaging, AI programs must be trained for even smaller subcategories, like gastroenterology or tumor-detection. However, the demand for more general, all-purpose tools that can address multiple market needs at once has created exciting opportunities for research that MIT scientists are already pursuing. For example, the Sensing, Learning & Inference group is focused on scalable algorithms and the Embodied Intelligence Community of Research aims to understand the nature of intelligent behavior and apply that to AI.
Finally, the training of these algorithms requires immense volumes of data, which can be difficult to both find and label. In a world with more information available than ever before, data acquisition and management are ever-present challenges, ones that many CSAIL researchers, including Professor Madden and Dr. Oliva, are hard at work on.
The visual computing industry has never been more exciting. Forbes predicts a “wave of billion-dollar computer vision startups” in the next few years, with the potential to re-shape nearly every aspect of human life. The cars, robots, homes, businesses, and entertainment of the future will be radically different than they are now thanks to the developing ability of computers to both understand and generate visual data. It’s a big market that’s only getting bigger, and CSAIL will continue to be at the forefront of that change.
To learn more about CSAIL’s work in this space, visit VisualComputing@CSAIL.