Dartmouth Researchers Training Machines to See Like Humans


June 6, 2016 – Dartmouth College researchers are using eye tracking data to create a new way to teach machines to see video like humans.

In the future, the researchers plan to experiment with other forms of human perception measurements to train machines beyond eye tracking data, such as human brain activity measured by functional magnetic resonance imaging (fMRI). Their goal is to develop technologies that advance the fields of machine intelligence and human communication.

The study is reported in a working paper that has yet to be published. A PDF is available on request. Here is a video showing eye fixations predicted by the Dartmouth system. Here are videos showing some of the sports actions recognized by the system.

Deep learning is a special form of machine learning where rich data representations are simultaneously
learned with the model, thus eliminating the need to engineer features by hand. Over the last few years, deep learning has revolutionized the field of still-image recognition by producing breakthrough results in several domains including object detection, scene classification and semantic segmentation. While there has been widespread expectation that these performance improvements will naturally extend to the video domain, the results so far have been lagging.

The Dartmouth researchers are recording eye fixations of human subjects who are watching video. They then use those human perception measurements to train computers to understand video as humans do. This is done by building algorithms that predict human brain activity and eye fixations of human subjects from video input, which help machines to focus automatically on the most salient information in the video and to predict what will happen next.

“Our research aims also to advance understanding of how the human brain represents video stimuli,” says Lorenzo Torresani, an associate professor of computer science at Dartmouth who is conducting the research with James Haxby, a professor of psychological and brain sciences at Dartmouth. “The visual attention modeling and the video-to-fMRI mapping learned by our architecture will provide a new platform to explore the neural mechanisms of dynamic human vision. We argue that fMRI and eye tracking data represent a largely unexplored but highly promising source of training data for learning computational vision models. If successful, our proposal can significantly improve the accuracy of automatic video understanding applications and open a new way to teach machines to see – by training them to mimic our own visual system.”

The research builds on previous achievements in machine learning, which involves creating deep neural networks of many layers, each layer building on the next, which is the start of computer reasoning, Torresani says.

Available to comment are Lorenzo Torresani at Lorenzo.Torresani@dartmouth.edu and James Haxby at James.V.Haxby@dartmouth.edu.


Broadcast studios: Dartmouth has TV and radio studios available for interviews. For more information, visit: Broadcast studios