Shamshiel
Shamshiel

Reputation: 2211

Detect characteristics in audio stream

I want to write/program/develop an algorithm that can recognize many characteristics in data from a line in/mic audio. The audio stream will be music and I want to filter out characteristics to distinguish songs from each other, by distinguishing I mean that you can call the genres of the songs apart.

One crucial thing that I absolutely want to detect is what kind of bar/beat the song has. For example I want to know if the song is in a 3/4 time.

The only helpful articles that I found were about BPM detection but that is not enough to distinguish a song from another song.

The FFT is a good start to get different characteristics from an audio stream but I don’t know where to begin. Is it possible to get the bar/beat with the FFT? Are there any good tutorials/code examples about this?

Is the FFT enough to get good characteristics of an audio stream or are there any other algorithms that are good for getting characteristics in audio streams?

Preferably I would do this in C# because that’s the programming language I have most experience with. Is this possible in C# or is another language better?

To sum my question up, I’m looking for any information about finding characteristics in an audio stream to get the beat/bar and other information to distinguish songs.

Upvotes: 5

Views: 2113

Answers (3)

Drew Noakes
Drew Noakes

Reputation: 311305

The open source aubio library extracts features from audio. It's written in C, but may serve as a reference for a managed implementation. Or you could P/Invoke to it.

aubio is a tool designed for the extraction of annotations from audio signals. Its features include segmenting a sound file before each of its attacks, performing pitch detection, tapping the beat and producing midi streams from live audio.

Upvotes: 0

Peter Webb
Peter Webb

Reputation: 691

A Fourier Transform will tell you the frequencies in the sound. This may be sufficient to tell you they key in which it was recorded. I doubt it will tell you much more than this.

Software (like Shazam) can identify two pieces of recorded music as being the same. You want to do something different - you want to extract meaning in the form of a classification. Compare this to voice recognition; it is a similar problem. Music is actually much harder, as there are often several instruments involved. Our brains can extract out the individual instruments (drums, guitars) using very sophisticated pattern recognition and then use individual instruments to determine meter and beat. Just as we can follow a conversation with the TV sound on. Computers can't decompose sounds into separate voices (yet), and simply hear a continuous sound. This makes me think that extracting meaningful information (beat, metre) will have to wait at least until we can resolve sound into separate "voices" on computers.

What you want to do will be possible one day, and will be great. But I think we are still some distance off; perhaps when computers can interpret speech fluently they will also be able to interpret music fluently. Maybe in 10 years.

Upvotes: 0

Drew Noakes
Drew Noakes

Reputation: 311305

I enjoyed reading the related articles by this blogger:

http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/

The author discusses fingerprinting songs. If you labelled a set of songs as having the qualities you're looking for and then fed the data into some kind of learning algorithm/classifier, you may have some success.

I do not think this is a solved problem, and so giving you a categorical answer is not possible, as far as I know.

Good luck!

Upvotes: 3

Related Questions