MIT researchers have developed a computer system that independently adds realistic sounds to silent videos. Although the technology is nascent, it’s a step toward automating sound effects for movies.
In a series of videos of drumsticks striking things – including sidewalks, grass and metal surfaces – the computer learned to pair a fitting sound effect, such as the sound of a drumstick hitting a piece of wood or rustling leaves.
The findings are an example of the power of deep learning, a type of artificial intelligence whose application is trendy in tech circles. With deep learning, a computer system learns to recognize patterns in huge piles of data and applies what it learns in useful ways.
In this case, the researchers at MIT’s Computer Science and Artificial Intelligence Lab recorded about 1,000 videos of a drumstick scraping and hitting real-world objects. These videos were fed to the computer system, which learns what sounds are associated with various actions and surfaces. The sound of the drumstick hitting a piece of wood is different than when it disrupts a pile of leaves.
Once the computer system had all these examples, the researchers gave it silent videos of the same drumstick hitting other surfaces, and they instructed the computer system to pair an appropriate sound with the video.
To do this, the computer selects a pitch and loudness that fits what it sees in the video, and it finds an appropriate sound clip in its database to play with the video.
To demonstrate their accomplishment, the researcher then played half-second video clips for test subjects, who struggled to tell apart whether the clips included an authentic sound or one that a computer system had added artificially.