Reputation: 79
I have a (mostly) working program to compare two wav files, to see if the smaller one is in the bigger one. This is done in java.
I do this by first making sure both wav files are in canonical wave format. I then get a byte array of data out of them using AudioInputStream. I take out data in chunks of a certain frame rate (such as right now: 4096 bytes). I take the first chunk of the smaller input, and go through chunks of the same size in the larger input.
I take these chunks and create double arrays with the same data. I get their FFT's and use a correlate function to find a peak in the resulting array of correlation coefficients. I then go to the next chunk of the smaller input, and see if a similar peak appears.
This works, the peaks are obvious when the files are the same, and most of the time the results are correct. I do not get false positives. I do, however, get false negatives.
This is because i'm not sure how to "Align" the data. The smaller file could come from any point in the larger file. Most of the time, this is caught via the chunking method I do this. But sometimes, it is as if the files are different, and no peak is found, though the files should return a high correlation.
If I take one of the files that are false negatives (no peak), and tweak them around a bit, snipping away at the end or beginning of them a few thousand bytes, and run the program again, it suddenly finds the peak and it is a very clear match. Thus, it does work, it is just somehow not finding the peak where the correlation is obvious. The correlation function I have translates the FFTs so that they match, so I would think that this would cover everything, but clearly I am not covering all of the data.
I'm not sure how to "align" the chunk of the smaller file to wherever it occurs in the larger file so that the correlate function picks up on where the correlation occurs. Everything works, I just need to eliminate the false negatives. Any advice?
Upvotes: 4
Views: 2695
Reputation: 1241
You can have a look at this paper. It explains the algorithm used by the shazam service which identify music from a sample of a few seconds.
Another method here, using self organizing maps to cluster similar music. Not exactly what you want to do but it can give you ideas.
Upvotes: 0
Reputation: 35088
This is called a matched filter. Your implementation is suffering because of the chunking. Traditionally, you take the input as a continuous stream, extract a chunk starting at each sample, and do the correlation. So if your input is 10k samples long, you end up running the filter 10k times, each time taking 4k samples into the filter (in your example). However, this is slow. There are a couple of ways to speed things up:
Use small chunks, like 256 points, to make the FFT computations faster. Your correlations probably won't look quite as nice, leading to more false positives, but maybe you can make a list of possible matches and go back and look with bigger chunks.
Rather than taking buffers starting from every sample in the input, take 4k buffers starting at every 512'th sample, say, and do the correlations (similar to Marcelo Cantos's suggestion in his comment). Then, look for peaks within the 512 samples around the middle, since the time shift will cause the spike to shift. Also, the extra non-correlated samples at the edges will cause the peak to not be full-valued, so you'll need to relax that constraint if you have it. Again, this might lead to more false positives, so you again have to resort to a list approach.
On the implementation detail side of things, I assume you already pre-compute the chunks from the smaller file? Also, you don't say whether you check the correlation in the time or frequency domains. You could look for flat magnitude in the frequency domain, which would equate to a spike in the time domain, to save yourself the inverse FFT. You'll have to do some experiments to determine how flat the spectrum is, but this might cut the time down quite a bit.
Upvotes: 2
Reputation: 16209
I'm not sure I completely grasp the algorithm you are using, but here's a thought: If you can get waves to be recognized by manually snipping away bits at the beginning and end, isn't that a possible solution for your algorithm also?
Upvotes: 0
Reputation: 185852
Use a convolution filter to compare two waveforms. It will tell you if and where a match occurs. Fast algorithms to compute convolutions are available.
Upvotes: 2