Reputation: 13329
I am getting audio fingerprints from sound clips, using fpcalc. They look like this:
AQAAE9GSKVOkLEOy5PlQE0d9fId7HD-aHD_xhMeRrKORLseX44etHD8AYcAgSrEjDKFAsIGIFAJZ
AQAAE1M9RUkW1NGFH0d4HcnyJIlw4UW17HiyPMHt4B18EX2go9qJTz_eJzgBgBg4CphigUCMGCWFAcAw
AQAAAA
Now I record a sound and fingerprint it, it might look like this:
AQAAE5ISLVOkTEF-QfURpkGZHHeeIpehB3HMoRKaikbTKHvQNnlwpIdOxNHHY_IPJttlAECEI8BBAAgFAiigAA
Now Im looking at my database to find the closest match using levenshtein distance like this:
def levenshtein_distance(first, second):
"""Find the Levenshtein distance between two strings."""
if len(first) > len(second):
first, second = second, first
if len(second) == 0:
return len(first)
first_length = len(first) + 1
second_length = len(second) + 1
distance_matrix = [[0] * second_length for x in range(first_length)]
for i in range(first_length):
distance_matrix[i][0] = i
for j in range(second_length):
distance_matrix[0][j]=j
for i in xrange(1, first_length):
for j in range(1, second_length):
deletion = distance_matrix[i-1][j] + 1
insertion = distance_matrix[i][j-1] + 1
substitution = distance_matrix[i-1][j-1]
if first[i-1] != second[j-1]:
substitution += 1
distance_matrix[i][j] = min(insertion, deletion, substitution)
return distance_matrix[first_length-1][second_length-1]
Im not getting good results, as the sounds does not match well with the samples I give it.
Am I doing this correctly? Are there better fingerprinting libraries out there? Im using python or ruby..
Im trying to match a wistle to a bird call.
Upvotes: 1
Views: 3197
Reputation: 970
Run fpcalc with the -raw option to give you the 32bit integers you need to compare.
./fpcalc -raw audio.wav
For a very easy comparison, convert each fingerprint to 20 bits:
Python example
fps_20 = [x >> 12 for x in fps]
and count the difference.
Upvotes: 2
Reputation: 1656
First, you should not compare the code strings directly. I do not know which algorithm pfcalc is based on but it is very likely it measures some audio features (such as energy, mfcc ... as mentionned aboved) on each frame of your audio input. These features may be integer values which are then converted as string (or base64 string). So comparing the values of these strings does not make any sense (except if you are trying to identify identical audio content).
I do not sure I understand well what you are trying to do "Im trying to match a wistle to a bird call", but I think what you are to do wont be resolved using audio fingerinting since it is designed to recognized "almost similar" audio contents.
Upvotes: 2
Reputation: 3478
Methods of fingerprint does not work well for what you need !
I have seen Mel Frequency Cepstral Coefficients (MFFCs) to solve this kind of problem ...
There are other methods, how extract a set of descriptors ( Mean irregularity, Mean Centroid, standard deviation irregularity, MFCC ) and use one classification method (Random Forests, MLP) !
Upvotes: 1