Reputation: 1
So i am using the datasketch library to find if the audio 2 and audio 3 are similar to the audio 1. However even at the threshold=1 where it should only output audios that are 100% same, it shows the out of the other 2 audios aswell which are really different from the 1st audio. The link to the audios All of them are different audios but with same 29second length
from datasketch import MinHash , MinHashLSH
x1 , Sr1 = librosa.load(r'path\f1.mp3')
mfcc1 = librosa.feature.mfcc(y=x1 , sr=Sr1)
mfcc1 = mfcc1.tobytes()
x2 , Sr2 = librosa.load(r'path\f2.mp3')
mfcc2 = librosa.feature.mfcc(y=x2 , sr=Sr2)
mfcc2 = mfcc2.tobytes()
x3 , Sr3 = librosa.load(r'path\f3.mp3')
mfcc3 = librosa.feature.mfcc(y=x3 , sr=Sr3)
mfcc3 = mfcc3.tobytes()
minhash1 = MinHash(num_perm=128 , hashfunc=hash)
minhash2 = MinHash(num_perm=128 , hashfunc=hash)
minhash3 = MinHash(num_perm=128 , hashfunc=hash)
for col1 in mfcc1:
minhash1.update(col1)
for col2 in mfcc2:
minhash2.update(col2)
for col3 in mfcc3:
minhash3.update(col3)
lsh = MinHashLSH(threshold= 1 , num_perm=128)
lsh.insert("minhash2",minhash2)
lsh.insert("minhash3",minhash3)
result=lsh.query(minhash1)
print(result)
Upvotes: 0
Views: 110