The company’s AI project is called “Ego4D.” It involves a consortium of researchers from 13 labs and universities in nine countries. The project hopes to create an AI that can “understand and interact with the world like we do” in first-person. To do this, the company is planning to use a constant stream of video and audio from people’s lives.
“AI typically learns from photos and videos captured in third-person, but next-generation AI will need to learn from videos that show the world from the center of action,” says the company during its announcement of the project. “AI that understands the world from this point of view could unlock a new era of immersive experiences.”
Kristen Grauman, Facebook’s lead AI research scientist, says: “Next-generation AI systems will need to learn from an entirely different kind of data – videos that show the world from the center of the action, rather than the sidelines.” (Related: Facebook’s AI robots will destroy the entire human race if not stopped.)
AI being taught using over 3,000 hours of first-person video footage
To teach the new AI how to perceive the world from the vantage point of a human, the Ego4D project gathered over 3,000 hours worth of first-person video from around 850 project participants in nine different countries.
Participants have taken the first-person video by wearing GoPro cameras and augmented reality glasses. They have recorded themselves accomplishing various tasks as they went about their daily lives.
The footage is supplemented by an additional 400 hours of first-person video captured by Facebook Reality Labs Research using test subjects who were wearing augmented reality smart glasses in staged environments.
The AI is then tasked with learning five key benchmark tasks from these 3,400 hours of footage.
The first is “episodic memory.” The AI needs to learn how to tie specific events to the correct time and location that they happened and to be able to recall them when asked.
The second benchmark is “forecasting.” The AI has to learn how to predict human actions and to try to anticipate human needs.
The third is “social interaction.” The intricacies of socialization are innate to the human experience, but no AI system anywhere in the world can properly understand it. The AI must be the first to learn how to understand it.
The fourth is “hand and object manipulation.” The AI must learn how to manipulate hands to learn new skills.
The last benchmark is “audio-visual diarization.” The AI needs to keep a “video diary” of clips and audio and must be able to tie them to specific locations and times.
No AI system anywhere in the world can perform all five of the tasks listed above. Facebook hopes this new AI can help people recall critical information, locate objects and accomplish tasks better and quicker.
“Ego4D makes it possible for AI to gain knowledge rooted in the physical and social world, gleaned through the first-person perspective of the people who live in it,” says Grauman. “Not only will AI start to understand the world around it better, it could one day be personalized at an individual level – it could know your favorite coffee mug or guide your itinerary for your next family trip.”
“This will allow AI systems to help you in ways that are really in the moment, contextual with what you’re doing and what you’ve seen before, all in ways you just can’t do now,” Grauman adds.