Reputation: 660

unsplitting audio file in python

first of all I've never done any audio coding at all. I'm trying to see how to start on this little side project, Imagine this, you have an MP3 song split into three chunks, how can I compare the beginning of a file to the end of another file to see if they are supposed to be played in that order.

Something like playing an album that is continuous across the playback, but you don't have the track numbers.

My idea is to compare the last part of the audio with the beginning of anothe and try to find the match until all pieces have been matched.

can anyone point me in the right direction?

Upvotes: 0

Answers (2)

flies

Reputation: 2135

I take it you're looking to figure out how to line the tracks up purely by how they sound; that is, the only information you have is audio information. I don't know python, but I know digital audio; here's an algorithm you could use. Basically, you need a difference metric to compare the start of each track to the end of each other track. Possible metrics include tempo, amplitude, and timbre. The method I suggest basically tries to match up the waveforms.

Any method must assume that there is no silence in between tracks and that they flow from one to another without gaps. Unfortunately, this is not true for most albums. If there's silence between songs, there's no way of doing it aside from going to discogs.com or something. I'd guess that a solution like that is probably going to be less work, and certainly more reliable than any script you could cook up in a reasonable amount of time.

Nevertheless, here's my suggestion:

For each file, get the beginning and final sample value as well as the derivative/slope at that point
Predict what the previous/next sample will be using the information you collected in part 1.
Compare the predictions for each beginning/end pair to see how close they match. (Comparing predicted and true values may not be sufficient. You probably also need to compare the derivatives.)
Pair off the best matches (in order from best to worst until your list is complete)

You probably need to convert to wave to do the above. If so, you could probably get away with very low sample/bit rate to minimize conversion time and RAM usage. I don't know what tools are out there for python, but if you could only convert the beginning and end of these files, that would certainly improve performance! Certainly, you only need to store a fraction of a second from each end after conversion.

Quantifying a "good match": To compare samples, you can use the square of the difference between prediction and truth, adding the difference for each end of the pair. Adding a derivative comparison means that you have to figure out how to weight the the derivative difference in comparison to sample difference.

Potential Problems

This algorithm should work well if you're working with CD quality audio files. It's possible that something happened in conversion that resulted in loss of information at track ends that would much up the above. Loss of a tiny fraction of a second worth of audio would completely ruin this approach!!

Another potential sticking point is that if your slope is very high, it's likely that the sound you're looking at is noisy. If that's the case, then the comparison I suggest will be error prone. You can do an auto-correlation or something to see if your audio is noisy (low ACF for short time scales indicating noise) and then down-weight truth/prediction differences in favor of slope, or even just noisiness.

In general, you may want to weight the differences between truth and prediction based on how big a jump you predict. If you predict a big jump, then deviations from the predicted value should be considered with respect to the size of the jump, so that bigger deviations matter more when the predicted jump is small.

Another approach, one that would be less sensitive to that problem, would be to do spectral analysis with FFT so that your distance metric becomes difference in amplitude of each frequency bin. This would be sensitive to transients (e.g. drum hits, guitar strums); using very small analysis windows might mitigate this difficulty. I can imagine that you could use this in addition to the above procedure, except you'd only use it as a positive criterion: if a beginning/end pair is a good spectral match, that probably indicates a true pair, but if the spectral data match, it's not informative because of the likelihood of transients corrupting the data. Alternatively, you could use long windows so that you're assured of including whatever transients may be present on both ends of the comparison.

Ultimately, the technique you use is likely to depend on what kind of music you're working with. If you've got a hard rock album, then it's likely that the spectrum everywhere will be full of cymbals and distorted guitars, which will basically look the same anywhere. On the other hand, if you've got abrupt transitions that occur right at the beginning of a track, then nothing will work.

As I say, doing this "by hand" is likely to be the most reliable and even fastest solution (considering development time), unless you're doing this to a very large set of mp3s.

Upvotes: 1

Martin Beckett

Reputation: 96139

It depends how it was split.

MP3 files are a header block followed by some data, you can split the file at any header block and combine them by just concating them together. There isn't necessarily anything in the header block to say what order it's in. ( http://en.wikipedia.org/wiki/MP3#File_structure )

If the MP3 files are separate tracks from an album they will have ID tags that list the track number. There are a few python MP3 libraries - see Accessing mp3 Meta-Data with Python

edit: if you mean analyse the music so that you can tell if one note is supposed to follow another, that's a little out of my expertise!

Upvotes: 2

unsplitting audio file in python

Answers (2)

Related Questions