Inside the World of Computer Vision: A Q&A with Scott Wehrwein

Computer Vision - the field of research surrounding the effort to give computers the ability to perceive the world through images - is becoming a bigger part of our daily lives, from facial-recognition security on smartphones to content-based image search. Western Today talked with WWU Assistant Professor of Computer Science Scott Wehrwein about his research in Computer Vision and how it could continue to impact our lives in the years ahead.

 

Western Today: Scott, your research focuses of an area of Computer Science called “Computer Vision.” How would you explain the field to a neophyte, in a few sentences?

SW: Computer vision is all about giving computers the ability to perceive the world through images. Many of these abilities are things that we as humans take for granted, for example recognizing faces and objects, or perceiving depth and motion. Computer vision researchers also work on tasks that would be too large-scale for humans to solve, such as building massive 3D reconstructions, mining internet images for fashion trends, and studying how the world changes over time by processing webcam streams.

 

WT: What are some of the most impactful applications coming from the computer vision field?

SW: In the past 10 to 15 years, some computer vision techniques have matured to the point of practicality, and meanwhile digital cameras went from a curiosity to a necessity. The most widely recognizable applications today include smartphone and camera features we now take for granted, like in-camera face and smile detection, panorama stitching, and video stabilization. Other hugely impactful applications are becoming increasingly ubiquitous: content-based image search is now built into most popular photo sharing platforms; face recognition techniques are being used to unlock phones and replace airline boarding passes; object detection and scene understanding play a huge role in today’s self-driving car systems.

 

WT: One of the areas being researched that is opening up new fields is “Deep Learning,” the ability of computers to actually learn. This seems both fascinating and terrifying, but can you explain how it works?

SW: There has been a lot of hype in the popular press about deep learning, and some of it is well-deserved. That said, at its core is about training computers to make predictions based on past examples. Deep learning refers to a specific sub-area of a larger field called Machine Learning, which itself is a sub-area of Artificial Intelligence (AI). The basic idea is that despite our best efforts, we don’t know how to write algorithms to solve certain problems, like recognizing cats in images. Fortunately, we do have a lot of examples of images containing cats. So instead of telling the computer exactly how to recognize a cat, we can use machine learning techniques to “train” models to recognize cats based on all these examples.

Deep learning is about scaling up these techniques, using models that require huge amounts of computation accelerated using specialized hardware, and training them on millions of images, or data points. Deep learning’s success has led to rapid advances in computer vision as well as many other areas, and widespread adoption in technology companies as well as other industries from finance to manufacturing.

 

WT: One of your areas of focus is using Computer Vision to create images that you couldn’t normally capture with a camera yourself, like this short video. What fascinates you about this kind of work?

SW: I was an amateur photographer before I became a computer vision researcher, and what motivates me is applications that allow me to see, or to visualize, the world in new and different ways. Photo enhancement and manipulation has been around as long as photography. But digital imaging and the advent of techniques that can “understand” images and videos enable whole new ways of thinking about photography and videography, and this is the idea behind the field we call Computational Photography.

As an example, the technique that created the video linked above was inspired by the limitations of time-lapse photography, where an artist sets up a camera to take a photo every few seconds or so, then compiles the frames into a video. The problem is that different things go different speeds: to see people walking, you’d want to show a standard video with 30 frames per second; visualizing clouds is best done by taking one image every 10 seconds or so; shadows move across the ground even more slowly than that, requiring one frame every 30 seconds to be able to see the motion. What this technique enables is a video of this scene that never truly took place, but visualizes all the motions in the scene better than any single time-lapse sequence could have: you can watch a few seconds of this video and see what’s going on at many timescales, from seconds to minutes to hours.

My work is motivated by applications like these, and it spans a range from creating the visualizations themselves to working on the fundamental computer vision techniques (like motion analysis and image compositing) that make them possible.

 

Scott Wehrwein got his doctorate from Cornell University in 2018, and has taught at Western since last fall.