Reputation: 21
I need some advice on this idea that I've had for an UNI project.
I was wondering if it's possible to split an audio file into different "streams" from different audio sources. For example, split the audio file into: engine noise, train noise, voices, different sounds that are not there all the time, etc.
I wouldn't necessarily need to do this from a programming language(although it would be ideal) but manually as well, by using some sound processing software like Sound Forge. I need to know if this is possible first, though. I know nothing about sound processing.
After the first stage is complete(separating the sounds) I want to determine if one of the processed sounds exists in another audio recording. The purpose would be sound detection. For (an ideal) example, take the car engine sound and match it against another file and determine that the audio is a recording of a car's engine or not. It doesn't need to be THAT precise, I guess detecting a sound that is not constant, like a honk! would be alright as well.
I will do the programming part, I just need some pointers on what to look for(software, math, etc). As I am no sound expert, this would really be an interesting project, if it's possible.
Thanks.
Upvotes: 2
Views: 2354
Reputation: 6324
This problem of splitting sounds based on source is known in research as (Audio) Source Separation or Audio Signal Separation. If there is no more information about the sound sources or how they have been mixed, it is a Blind Source Separation problem. There are hundreds of papers on these topics.
However for the purpose of sound detection, it is not typically necessary to separate sounds at the audio level. Very often one can (and will) do detection on features computed on the mixed signal. Search literature for Acoustic Event Detection and Acoustic Event Classification.
For a introduction to the subject, check out a book like Computational Analysis of Sound Scenes and Events
Upvotes: 1
Reputation: 1
Correlate reference signals against the audio stream. Correlation can be done efficiently using FFTs. The output of the correlation calculation can be thresholded and 'debounced' in time for signal identification.
Upvotes: 0
Reputation: 180303
It's extremely difficult to do automated source separation from a single audio stream. Your brain is uncannily good at this task, and it also benefits from a stereo signal.
For instance. voice is full of signals that aren't there all the time. Car noise has components that are quite stationary, but gear changes are outliers.
Unfortunately, there are no simple answers.
Upvotes: 0