MIT/CSAIL researchers add realistic sounds to silent videos, a step toward automating sound effects for movies?
“From the gentle blowing of the wind to the buzzing of laptops, at any given moment there are so many ambient sounds that aren’t related to what we’re actually looking at,” says MITPhD student Andrew Owens . “What would be really exciting is to somehow simulate sound that is less directly associated to the visuals.”
The notion of artificial sound generation has been around for sometime now, with concepts such as procedural audio, and in many ways its long overdue that the same amount of attention and computing power that is afforded to visual effects, be directed towards sound generation. CSAIL is directed by Tim Berners-Lee and is the largest research laboratory at MIT and one of the world’s most important centres of information technology research. I have found several articles which discuss this new development and have selected sections of them here :
the following is a selection from articles:
“Researchers envision future versions of similar algorithms being used to automatically produce sound effects for movies and TV shows, as well as to help robots better understand objects’ properties.
“When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it,” says CSAIL PhD student Andrew Owens, who was lead author on an upcoming paper describing the work. “An algorithm that simulates such sounds can reveal key information about objects’ shapes and material types, as well as the force and motion of their interactions with the world.”
The team used techniques from the field of “deep learning,” which involves teaching computers to sift through huge amounts of data to find patterns on their own. Deep learning approaches are especially useful because they free computer scientists from having to hand-design algorithms and supervise their progress.
The paper’s co-authors include recent PhD graduate Phillip Isola and MIT professors Edward Adelson, Bill Freeman, Josh McDermott, and Antonio Torralba. The paper will be presented later this month at the annual conference on Computer Vision and Pattern Recognition (CVPR) in Las Vegas.
In a series of videos of drumsticks striking things — including sidewalks, grass and metal surfaces — the computer learned to pair a fitting sound effect, such as the sound of a drumstick hitting a piece of wood or of rustling leaves.
The findings are an example of the power of deep learning, a type of artificial intelligence whose application is trendy in tech circles. With deep learning, a computer system learns to recognize patterns in huge piles of data and applies what it learns in useful ways.
In this case, the researchers at MIT’s Computer Science and Artificial Intelligence Lab recorded about 1,000 videos of a drumstick scraping and hitting real-world objects. These videos were fed to the computer system, which learns what sounds are associated with various actions and surfaces. The sound of the drumstick hitting a piece of wood is different than when it disrupts a pile of leaves.
Once the computer system had all these examples, the researchers gave it silent videos of the same drumstick hitting other surfaces, and they instructed the computer system to pair an appropriate sound with the video.
To do this, the computer selects a pitch and loudness that fits what it sees in the video, and it finds an appropriate sound clip in its database to play with the video.
To demonstrate their accomplishment, the researcher then played half-second video clips for test subjects, who struggled to tell apart whether the clips included an authentic sound or one that a computer system had added artificially.
But the technology is not perfect, as MIT PhD candidate Andrew Owens, the lead author on the research, acknowledged. When the team tried longer video clips, the computer system would sometimes misfire and play a sound when the drumstick was not striking anything. Test subjects immediately knew the audio was not real.
And the researchers were able to get the computer to produce fitting sounds only when they used videos with a drumstick. Creating a computer that automatically provides the best sound effect for any video — the kind of development that could disrupt the sound-effects industry — remains out of reach for now.
Although the technology world has seen significant strides of late in artificial intelligence, there are still big differences in how humans and machines learn. Owens wants to push computer systems to learn more similarly to the way an infant learns about the world, by physically poking and prodding its environment. He sees potential for other researchers to use sound recordings and interactions with materials such as sidewalk cement as a step toward machines’ better understanding our physical world.
Elizabeth Parker and Paddy Kingsland from the BBC Radiophonic Workshop in 1979 demonstrate the use of tape loops and tape-replay setups. We hear Elizabeth Parker’s “bubble music” and Paddy Kingsland on the electric guitar with twin Studer tape recorders.
This excerpt is from the BBC documentary The New Sound of Music produced in 1979.
Paddy Kingsland demonstrates twin Studer recorders in a delay-replay setup that some might refer to as “Frippertronics’ – named after Robert Fripp I believe. Fripp may have used twin Revox machines in a similar way for some of his compositions. It is an interesting setup, possibly described in some Workshop writings from the 1960s.
It was closed in March 1998, although much of its traditional work had already been outsourced by 1995.
We have more on the Radiophonic workshop elsewhere in this blog –
The techniques initially used by the Radiophonic Workshop were closely related to those used in musique concrète; new sounds for programs were created by using recordings of everyday sounds such as voices, bells or gravel as raw material for “radiophonic” manipulations. In these manipulations, audio tape could be played back at different speeds (altering a sound’s pitch), reversed, cut and joined, or processed using reverb or equalisation. The most famous of the Workshop’s creations using ‘radiophonic’ techniques include the Doctor Who theme music, which Delia Derbyshire created using a plucked string, 12 oscillators and a lot of tape manipulation; and the sound of the TARDIS (the Doctor’s time machine) materialising and dematerialising, which was created by Brian Hodgson running his keys along the rusty bass strings of a broken piano, with the recording slowed down to make an even lower sound.
Much of the equipment used by the Workshop in the earlier years of its operation in the late 1950s was semi-professional and was passed down from other departments, though two giant professional tape-recorders (which appeared to lose all sound above 10 kHz) made an early centrepiece. Reverberation was obtained using an echo chamber, a basement room with bare painted walls empty except for loudspeakers and microphones. Due to the considerable technical challenges faced by the Workshop and BBC traditions, staff initially worked in pairs with one person assigned to the technical aspects of the work and the other to the artistic direction.
THE SOUND OF CAPITAL
BBC new Drama Capital is edited by an ex-colleague of mine Philip Kloss and the Dialogue editor is by co-incidence the man who designed our Sound Effect Software we use here at LSFM.
This recent article is a fascinating insight into some of the sound post challenges.
To create an authentic soundscape for BBC drama Capital, dubbing mixer Howard Bargroff took a trip to the part of south London in which it is set,
writes George Bevir [from an article first published in Broadcast Online]
TX 9pm, Wednesdays, from 24 November, BBC1
Length 3 x 60 minutes
Dubbing mixer/ FX editor Howard Bargroff
Foley editor Stuart Bagshaw
Dialogue editor Peter Gates (two episodes); Michele Woods (one episode)
ADR supervisor Kallis Shamaris
FX editor Mike Wabro (one episode)
Picture post Technicolor
Director Euros Lyn
Writer John Lanchester
Lesley Sharp (Mary)
Stepping out of the studio to capture authentic audio is not always without peril, no matter how inconspicuous the individual or discreet the recording device. A few years ago, dubbing mixer Howard Bargroff needed some crowd noise for a rap album he was working on but a trip to a pub at closing time to record the sound of a pack of people nearly resulted in him being “lynched” by the suspicious boozers.
“You have to be careful you don’t look like a psychopath, but people can still be suspicious,” he says.
Fortunately for Bargroff, his trip to Clapham to capture the sounds of south London for BBC drama Capital passed by with little more than a few sideways glances.
The three-parter, which is based on John Lanchester’s novel, is a portrait of a road in Clapham that is transformed by rising property prices and then rocked by an anonymous hate campaign. Bargroff ’s brief was to give London a presence so that the city becomes a character in its own right and “leaks” in to every scene.
“It’s a contemporary piece about the gentrification of London, so I went to south London and made a bunch of recordings,” says Bargroff. “As I moved around Clapham, I saw microcosms of the plot – people interacting with builders, posh mums coming out of buildings, and so on. It was as though the book was coming alive and I was recording it.”
Bargroff says his recordings, made using a Zoom H5, became the “sonic backbone” of the series, comprising around 50% of the background sound.
“We used lots of bridging sounds, such as sirens, between cuts. At first I thought the recordings were a luxury, but they soon became a necessity; those recordings of Clapham High Street, of a park, of close and distant traffic, planes passing overhead and sirens became themes throughout the episodes. It helped to create the feeling that in London you are never more than a few streets away from a busy high street.”
Bargroff, who worked at De Lane Lea, Future Post, Videosonics and Pepper before going freelance, waved goodbye to the city a couple of years ago when he moved from Battery Studios in Willesden to a studio attached to his home in Woburn Sands. Since then, through his company Sonorous, he has mixed both series of Broadchurch (ITV) and From There To Here (BBC1) from home, and as a freelancer completed the pre-mix for Fortitude (Sky Atlantic) and Luther (BBC1) in his home studio.
Bargroff ’s standard approach for an hour of drama is to spend three days premixing at home, followed by two client-attended days at a dry-hire facility in London. That means he needs to keep his home set-up as up-to-date as possible so that it is compatible with other facilities (see box) and he can quickly pick up where he left off. For Capital, he completed the final mix with director Euros Lyn at the “excellent” Hackenbacker, which also provided the ADR and Foley.
“Dru Masters created a fantastic score and delivered it quite early so I had time to weave it and the music treatments in. That meant when I turned up with Euros at Hackenbacker on the first day, we could play the whole episode, so we had quite a lot of review time. I like the three-day premix because it means you can turn up with something cohesive. I try to protect that 3:2 approach; most jobs fit that template and clients are usually happy to accommodate it.”
CAPITAL KEY KIT
Bargroff’s home studio is equipped with Avid Pro Tools HDX2, an Icon D-Command 16-fader desk and PMC twotwo active monitors. Plug-ins are “industry standard”, including Audio Ease Altiverb, Waves WNS, iZotope RX, ReVibe and Speakerphone.
Projects are transferred between facilities using portable drives, with Cronosynch software to synchronise work completed at home with a transfer drive and a local drive in a dry-hire facility. “I keep the transfer drive synched to both ends so at any point during the job I have a mirror of the media in two locations, which is great for back-up. At the end of the job, everything is backed up to a Raid system for archiving.”
Bargroff also uses the Soundminer librarian program for managing his library of work. “It can scan multiple terabytes in a few hours and give you complete breakdown, or you can do a keyword search. All my libraries are well organised, but without that software I wouldn’t be able to find a thing.”
(The copyright to this content lies with Broadcast Online and is reproduced here under educational licence)
Original article is here http://www.broadcastnow.co.uk/techfacils/capital-bbc1/5097498.article
We get all the best visitors to Lincoln School of Film and Media.
This is Adrian Bell, a Film and TV recordist, just back from filming the new DaVinci Code film ‘Inferno’ directed by Ron Howard.
Check out his CV at www.adrianbell.net He won a BAFTA for best sound in 2014 for Dancing on the Edge
He’s in town to be interviewed by BBC Radio Lincs tomorrow about the feature film ‘Everest’ he worked on earlier in 2014 Directed by Baltasar Kormakur, and yes I tool the oportunity of giving him a tour and collared him to come back and talk to students some time soon. He’s on BBC Lincolnshire sometime after 11.00am tomorrow
Adrian lives in London but is originally from Lincolnshire. He has a wealth of experience and is keen to come and contribute to the Audio Production course content if he can.
here’s a short photo slideshow
I watched INTERSTELLAR last night. The soundtrack was (as you would expect from a Christopher Nolan film) electrifying.
Music was composed by Nolan’s composer of choice Hans Zimmer.
As I watched I realised that the predominant instruments were strings and what sounded like the biggest church organ in the world!. I wasn’t far wrong it seems.
“Over the course of the film, the core five-note melody (the soundtrack is released on November 17th, but for a taste listen to Trailer #3) is expressed in different ways. The score is an ensemble effort combining 34 strings, 24 woodwinds, four pianos, and 60 choir singers, all of which get their time to sound off. But the starring, and most meaningful voice, is the 1926 four-manual Harrison & Harrison organ, currently housed at the 12th-century Temple Church in London and played in the movie by its director of music, Roger Sayer”(i)
What also became apparent is that on many occasions the audio track was so loud that at times when actors were speaking you could not hear clearly what they were saying. As I watched, my sound editor’s head said to myself “this must be deliberate, Nolan must WANT US to be straining to hear what they’re saying, to make the scene tense, threatening or downright overwhelming. I certainly thought at times that the cinema sound system was going to blow. I was being affected physiologically, my heart-rate was increasing. At other times however, quieter times, The main dialogue was relatively low in level – almost hard to hear what people were saying. This huge dynamic range (only available in cinemas) made me wonder how it will sound on DVD/streaming etc. You cant really watch a Nolan film anywhere other than the cinema it seems to me.
Afterwards some research into the sound led me to an article on this very subject, which confirms much of what I thought and far more.
Here’s more from the article:-
Hans Zimmer’s score drowns out dialogue and has already broken an Imax theater, but there’s thematic significance in all that noise
“As Zimmer recently told the Film Music Society, the organ was chosen for its significance to science: From the 17th century to the time of the telephone exchange, the pipe organ was known as the most complex man-made device ever invented. Its physical appearance reminded him of space ship afterburners. And the airiness of the sound slipping through pipes replicates the experience of suited astronauts, where every breath is precious (a usual preoccupation with sci-fi movies that is taken very literally in Zimmer’s music, which also features the exhalations of his human choir).
Zimmer’s score—which alternates between a 19th-century Romanticism and 20th-century Minimalism—of course has an element of spirituality to it. But the organ does more than just recall churches. From the movie’s earliest moments, it performs some very necessary narrative legwork for the overburdened screenplay. When it kicks in as Cooper chases down an Indian surveillance drone, a light touch on the organ keys, paired with rousing strings, creates a whirling, ethereal sound that channels Cooper’s interior life. The giddy tone it sets demonstrates that Cooper is a risk-taker and adventurer, which solves the screenplay’s early problem of establishing emotional motive for Cooper to leave his children.
As organs are wont to do, this one resonates. And there are moments when the decibels at which it does can only be described as an action-movie crutch. The organ gets a noticeably more heavy-handed touch as the plot becomes ever-more preposterous. It blasts when the elder Professor Brand, played by Michael Caine, hands over the keys to the spaceship—and his life’s work—to a farmer (Cooper) who presumably hasn’t piloted anything except a plow in a while. It booms when Ann Hathaway’s younger Dr. Brand shakes hands with “Them,” heavily foreshadowing events to come. Some of these moments necessitate the extra spiritualistic oomph, but it’s often the case that when the plot turns implausible, Nolan and Zimmer ramp up the organ.”(i)
X FACTOR 2014
VT Editors, often have to wrestle with a huge amount of sound information. Especially on shows that have discreet microphones all over the place – such as the X Factor.
Editor Janci Kovic recently did this screengrab of his final timeline for a Bootcamp Episode of X Factor. This was bootcamp the episode after auditions.
Having the ability to cut off words, change the order of what judges are saying and soloing the backstage reactions at the same time was very helpful to get the story done.
THE AUDIO TRACKS SHOWN INCLUDE:
Jury 4ch’s, crowd 2ch’s, singers port, mic and his instrument 3ch’s, band 12ch’s, backstage with moderator 3ch’s, stage mix 2ch’s, music 6ch’s, sfx 4ch’s, vo and other ports.
THE VIDEO was recorded on QUADRUS
BBC launches new MUSIC site with God Only Knows,
a star-studded film
featuring ‘The Impossible Orchestra’
I heard about this on the way in to work this morning – but I didn’t know what the event was going to be until sitting down with the family at 8pm. It reminded me of the great ‘Perfect Day’ BBC promotional film. I have gathered these comments from various sources available online. The song was broadcast simultaneously on Tuesday 7th October 2014 on BBC One, Two, Three, Four and Radio 1, 2, 4, 6 and 5 Live.
The track, which will also be released in aid of Children in Need, features 27 artists across all musical genres. They include Sir Elton John, Stevie Wonder, Chris Martin, Sam Smith, Brian May, Jamie Cullum and Nicola Benedetti.
God Only Knows has reached almost mythical status in the pop canon. Written and produced by Brian Wilson with lyricist Tony Asher and younger brother Carl Wilson on vocals, it was released in 1966 as part of The Beach Boys’ Pet Sounds album. It reached Number 2 in the UK and Number 39 in the US Charts. It has become one of the most lauded tracks of all time. Rolling Stone placed it at 25 in their 500 Greatest Songs of All Time and in 2006, Pitchfork magazine crowned God Only Knows as the best song of the 1960s.
BBC Music will encompass TV and radio programming, digital services and schemes to support emerging talent including the introduction of classical music to UK primary schools. The song’s original writer, Brian Wilson, also features on the track, along with the BBC Concert Orchestra. The collective group of musicians has been named the Impossible Orchestra. Bob Shennan, director of BBC Music, said: “This ‘impossible’ orchestra is a celebration of all the talent, diversity and musical passion found every single day throughout the BBC.”
The line by line breakdown of singers is as follows:
BBC Concert Orchestra
Martin James Bartlett – celeste
Pharrell Williams – I may not always love you
Emeli Sandé – But as long as there are stars above you
Elton John – You never need to doubt it
Lorde – I’ll make you so sure about it
Chris Martin – God only knows what I’d be without you
Brian Wilson – If you should ever leave me
Florence Welch – Well life would still go on believe me
Kylie Minogue – The world could show nothing to me
Stevie Wonder – So what good would living do me
Eliza Carthy – God only knows what I’d be without you
Nicola Benedetti – violin
Jools Holland – piano
Brian May – electric guitar
Jake Bugg – lalalala
Katie Derham – violin
Tees Valley Youth Choir – God only knows
Alison Balsom – piccolo trumpet
One Direction – God only knows what I’d be without you
Jaz Dhami – God only knows what I’d be without you
Paloma Faith – God only knows what I’d be without you
Chrissie Hynde – God only knows
Jamie Cullum – God only knows what I’d be without you
Baaba Maal – God only knows
Danielle de Niese – God only knows what I’d be without you
Dave Grohl – God only knows
Sam Smith – God only knows what I’d be without you
Brian Wilson – God only knows what I’d be without you
THE TELEGRAPH: the Future of Music on BBC
THE GUARDIAN: BBC MUSIC LAUNCH
Sources: BBC Media Centre, BBC YOUTUBE Channel, BBC NEWS WEBSITE
(accessed 7th October 2014)
Richard Hastings-Hall visited Audio Production level 2 students today to talk about ‘dubbing mixing’, in particular mixing for medium budget daytime drama and the technical and creative constraints that working on shows like this can have. They are often handled very differently to other dramas, documentaries and television series etc.
For example the directors of these daytime dramas are not paid to be present at the final mixing session – it’s only the Exec Producer who signs off the mix.
Richard brought his Pyramix set-up (made by Emerging Technologies) with him which sadly did have some technical issues – but this was a good example of how ‘anything that can go wrong – will go wrong’. A thankyou must go to Luke Johnston who showed his skill in drive re-mapping!.
Some students found it reassuring that it wasn’t ‘just them’
Richard mentioned metering and loudness, and the need for good adherence to technical standards.
To find out more about the BBC delivery requirements look here
Richard revealed that often with quick turn around drama shows like Doctors – the sync sound recordings are not always perfect. the crew often doesn’t have time to go again. So very often
The dubbing team are left to ‘fix it in post’. Alternative lines of dialogue are hunted down from the rushes, smoothing techniques are used and generally the pressures are such that all this must be done in one 12 hour session. No foley ar ADR sessions are possible.
“In Doctors we don’t have time for foley sessions, so we have to be very resourceful when it comes to our use of time. Much of what we do is fixing problems”
Louise Wilcox, another dubbing mixer was featured in an article in the Institute of Professional Sound Website which may be of interest
On another occasion during a Jane Austin themed episode he went overboard on a fight scene and had to remix it due to a topical news event which happened close to transmission.
Richard also talked about Brinkburn Street for BBC, which presented some unusual sound dillemas, as it was set in both the present day and the 1930’s so sometimes there were horses and carts outside the houses and sometimes jet engines and traffic. See iPLayer
Richard has been a dubbing mixer for over 20 years and has mixed 717 episodes of Doctors. He is currently freelance, based in Nottingham.
MONDAY 16th December 2013 at 11.00am on BBC Radio 4
In an exclusive interview for Radio 4 David Attenborough talks to Chris Watson about his life in sound.
One of Sir David’s first jobs in natural history film making was as a wildlife sound recordist. Recorded in Qatar, David Attenborough is with wildlife sound recordist Chris Watson, there to make a film about a group of birds he is passionate about, The Bird of Paradise. It is in Qatar where the worlds largest captive breeding population is and it is in this setting Chris Watson takes Sir David back to the 1950’s and his early recording escapades, right through to today where David Attenborough narrates a series of Tweet of the Day’s on Radio 4 across the Christmas and New Year period