Faizan Ul Haq
Faizan Ul Haq

Reputation: 1

Using DataSketch to find similarity between 3 audios using mfccs

So i am using the datasketch library to find if the audio 2 and audio 3 are similar to the audio 1. However even at the threshold=1 where it should only output audios that are 100% same, it shows the out of the other 2 audios aswell which are really different from the 1st audio. The link to the audios All of them are different audios but with same 29second length

from datasketch import MinHash , MinHashLSH

x1 , Sr1 = librosa.load(r'path\f1.mp3')
mfcc1 = librosa.feature.mfcc(y=x1 , sr=Sr1)
mfcc1 = mfcc1.tobytes()

x2 , Sr2 = librosa.load(r'path\f2.mp3')
mfcc2 = librosa.feature.mfcc(y=x2 , sr=Sr2)
mfcc2 = mfcc2.tobytes()

x3 , Sr3 = librosa.load(r'path\f3.mp3')
mfcc3 = librosa.feature.mfcc(y=x3 , sr=Sr3)
mfcc3 = mfcc3.tobytes()

minhash1 = MinHash(num_perm=128 , hashfunc=hash)
minhash2 = MinHash(num_perm=128 , hashfunc=hash)
minhash3 = MinHash(num_perm=128 , hashfunc=hash)

for col1 in mfcc1:
    minhash1.update(col1)

for col2 in mfcc2:
    minhash2.update(col2)

for col3 in mfcc3:
    minhash3.update(col3)

lsh = MinHashLSH(threshold= 1 , num_perm=128)
lsh.insert("minhash2",minhash2)
lsh.insert("minhash3",minhash3)
result=lsh.query(minhash1)
print(result)

Upvotes: 0

Views: 110

Answers (0)

Related Questions