towards automation for sound effects

MIT/CSAIL researchers add realistic sounds to silent videos, a step toward automating sound effects for movies?

Screen Shot 2016-06-16 at 08.35.26MIT researchers have developed a computer system that independently adds realistic sounds to silent videos. Although the technology is nascent, it’s a step toward automating sound effects for movies.

“From the gentle blowing of the wind to the buzzing of laptops, at any given moment there are so many ambient sounds that aren’t related to what we’re actually looking at,” says MITPhD student Andrew Owens . “What would be really exciting is to somehow simulate sound that is less directly associated to the visuals.”

The notion of artificial sound generation has been around for sometime now, with concepts such as procedural audio, and in many ways its long overdue that the same amount of attention and computing power that is afforded to visual effects, be directed towards sound generation. CSAIL is directed by Tim Berners-Lee and is the largest research laboratory at MIT and one of the world’s most important centres of information technology research. I have found several articles which discuss this new development and have selected sections of them here :

see demonstration video here 

the following is a selection from articles:

“Researchers envision future versions of similar algorithms being used to automatically produce sound effects for movies and TV shows, as well as to help robots better understand objects’ properties.

“When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it,” says CSAIL PhD student Andrew Owens, who was lead author on an upcoming paper describing the work. “An algorithm that simulates such sounds can reveal key information about objects’ shapes and material types, as well as the force and motion of their interactions with the world.”

Screen Shot 2016-06-16 at 08.34.20

The team used techniques from the field of “deep learning,” which involves teaching computers to sift through huge amounts of data to find patterns on their own. Deep learning approaches are especially useful because they free computer scientists from having to hand-design algorithms and supervise their progress.

The paper’s co-authors include recent PhD graduate Phillip Isola and MIT professors Edward Adelson, Bill Freeman, Josh McDermott, and Antonio Torralba. The paper will be presented later this month at the annual conference on Computer Vision and Pattern Recognition (CVPR) in Las Vegas.

In a series of videos of drumsticks striking things — including sidewalks, grass and metal surfaces — the computer learned to pair a fitting sound effect, such as the sound of a drumstick hitting a piece of wood or of rustling leaves.

The findings are an example of the power of deep learning, a type of artificial intelligence whose application is trendy in tech circles. With deep learning, a computer system learns to recognize patterns in huge piles of data and applies what it learns in useful ways.

In this case, the researchers at MIT’s Computer Science and Artificial Intelligence Lab recorded about 1,000 videos of a drumstick scraping and hitting real-world objects. These videos were fed to the computer system, which learns what sounds are associated with various actions and surfaces. The sound of the drumstick hitting a piece of wood is different than when it disrupts a pile of leaves.
Screen Shot 2016-06-16 at 08.34.52

Once the computer system had all these examples, the researchers gave it silent videos of the same drumstick hitting other surfaces, and they instructed the computer system to pair an appropriate sound with the video.
To do this, the computer selects a pitch and loudness that fits what it sees in the video, and it finds an appropriate sound clip in its database to play with the video.

To demonstrate their accomplishment, the researcher then played half-second video clips for test subjects, who struggled to tell apart whether the clips included an authentic sound or one that a computer system had added artificially.
But the technology is not perfect, as MIT PhD candidate Andrew Owens, the lead author on the research, acknowledged. When the team tried longer video clips, the computer system would sometimes misfire and play a sound when the drumstick was not striking anything. Test subjects immediately knew the audio was not real.

And the researchers were able to get the computer to produce fitting sounds only when they used videos with a drumstick. Creating a computer that automatically provides the best sound effect for any video — the kind of development that could disrupt the sound-effects industry — remains out of reach for now.

Although the technology world has seen significant strides of late in artificial intelligence, there are still big differences in how humans and machines learn. Owens wants to push computer systems to learn more similarly to the way an infant learns about the world, by physically poking and prodding its environment. He sees potential for other researchers to use sound recordings and interactions with materials such as sidewalk cement as a step toward machines’ better understanding our physical world.

 

taken from this article
and this webpage

csail_logo

The Computer Science and Artificial Intelligence Laboratory – known as CSAIL ­– is the largest research laboratory at MIT and one of the world’s most important centers of information technology research.
CSAIL and its members have played a key role in the computer revolution. The Lab’s researchers have been key movers in developments like time-sharing, massively parallel computers, public key encryption, the mass commercialization of robots, and much of the technology underlying the ARPANet, Internet and the World Wide Web.  
CSAIL members (former and current) have launched more than 100 companies, including 3Com, Lotus Development Corporation, RSA Data Security, Akamai, iRobot, Meraki, ITA Software, and Vertica. The Lab is home to the World Wide Web Consortium (W3C), directed by Tim Berners-Lee, inventor of the Web and a CSAIL member.

 

Tales From The Bridge – Martyn Ware


At level 1, one of the first assessment tasks I ask students to undertake is the creation of a soundscape. However, for some, the very notion of the soundscape is unfamiliar. Soundscapes can take many different forms – some can be very challenging for the listener/audience.

Whilst listening to Radio 4 this morning, I heard a short interview with Martyn Ware (The Human League, Heaven 17) in which he explains the concept behind his Tales From The Bridge soundscape currently installed at London’s Millennium Bridge. This is an excellent example of an accessible approach to the creation of a soundscape and hopefully one which will inspire some of our students’ creativity.

Listen to the clip here

Mr FOLEY – short film


A darkly funny but nightmarish scenario, a man wakes up in hospital with a group of sound artists soundtracking his life. Mr Foley is an award winning short film directed by Dublin directing duo Mike Ahern & Enda Loughman aka D.A.D.D.Y. The film has been on the festival circuit for a while but has just premiered online for all to see, YAY!

link to VIMEO
Written & Directed by D.A.D.D.Y.
Short film
Duration: 4.50 m

Working With Binaural: Bringer

This post was submitted by level 3 Audio Production student Matt North.

For the first of two audio projects required on the 3rd year of Audio Production, Luke Pickering and I decided to experiment with binaural audio.  What started out as an idea of producing a 5.1 surround sound mix for an animation rapidly developed into writing and producing our own short horror film, which focussed on the binaural soundtrack to induce fear upon the audience.

The main premise for our film is completely unique and takes the form of three short films, each representative of the three character’s first-person perspectives.  We wrote a script based on this idea so that in order to fully understand the entire storyline of the film, all three films need to be viewed at once.  The films are to be exhibited across three screens at the Degree Show, allowing three audience members to experience a character’s involvement in the film and then conversing with the other audience members afterwards to understand what happened in their film.

Luke was aware of an abandoned RAF building on the outskirts of Lincoln, which we visited in an attempt to draw up ideas for the storyline of our film.  The place itself was extremely desolate and had a strong sense of isolation from the city; in other words, it was very creepy.

With our experience in radio drama script writing from the 2nd year, we wrote a script based upon the graffiti within the building and created the fictional storyline of three art fanatics searching for the early work of a popular graffiti artist, Thomas F. Bringer.  We wanted to come up with an original and non-clichéd idea and felt that that we could portray the feeling of horror by manipulating the binaural soundtrack.

Following the guidance of such website as DigDagga.com, a blog on binaural audio, we invested £190 into some in-ear binaural microphones from the USA.  On location, we set our actors up with a digital camera gaffa taped to a sports headband around their head and placed the binaural mics into their ears.  The mics were extremely sensitive and we had to do some rigorous testing to ensure that we would record the best possible signal and not have the gain set too high.

For such scenes as Dan’s attack, we really wanted to play upon the binaural aspect and thought of many ways in which we could inject both realism and fear into the soundtrack.  Upon reflection, the sound of the tape being wrapped around Dan’s head really is horrifying.  We had no problems with the audio upon location until we reviewed that particular scene, when we realised that the gain was incorrectly set to accommodate the screaming and this resulted in heavy clipping.  We both decided that despite it sounding terrible, it actually added to the sense of horror we were trying to convey.

We recorded some binaural Foley on location, such as the coughing up of blood in Dan’s film and also some bangs from the main room that are evident in Ed’s and placed them within the original recorded audio.  Due to them being on location, we didn’t have to worry about matching the reverbs to the room and they slotted into the soundtrack smoothly.  As we wanted everything the audience heard in the film to be binaural, we completed some post-production Foley with the binaural mics as well.  These were then manipulated to add to the sound of Dan’s attack and death, as well as Lisa’s panic attack to give off an extra sense of realism.

After the soundtrack was ready we researched into EQ mapping, which we discovered was necessary to add an extra realism to the films.  This involved playing white noise in the Sound Theatre into the binaural microphones and then using O-Zone’s EQ matching plug-in to read the frequency response of the mics.  This was then inverted to bring the frequency spectrum to a flat level, therefore replicating human hearing as much as possible.  This was essential learning in the use of binaural.

Whilst the majority of binaural experimentation has been through the use of dummy-heads, we attempted with BRINGER to create a realistic and professional binaural soundtrack on a small budget using in-ear binaural mics.  This process has ultimately taught us a lot about recording binaurally and I would recommend anyone to attempt and experiment with the advantages that binaural can bring to a production.

The Freesound Project

 

The Freesound Project aims to create a huge collaborative database of audio snippets, samples, recordings, bleeps, …  released under the Creative Commons Sampling Plus License.  The Freesound Project provides new and interesting ways of accessing these samples, allowing users to browse the sounds in new ways using keywords, a “sounds-like” type of browsing and more up and download sounds to and from the database, under the same creative commons license interact with fellow sound-artists!

They are looking for institutions and schools who want to help them with this effort. Could our institution help  fill the database? Are there any students who can help by doing (field-)recordings as assignments? Recordings of instruments? Do you have a large batch of usable sounds? Anything goes as long as the sounds can be released under the Creative Commons Sampling Plus License.

Presentation Day

Today was an assessment presentation day for Level 1 Audio Production students. In small groups, they pitched their ideas for a music or sound design concept for the Electronic Music Production module. Being the creative bunch they are, ideas were wide ranging; from a drum and bass remix, to a sound design for a horror game, to an audio aid for relaxation, to a theme tune for a game show! The presentations are a great way for students to see, hear and comment on each other’s work and to receive developmental feedback. They now have to turn their concepts into fully formed audio products for the end of the semester. I’m looking forward to hearing them!