John
John

Reputation: 29

My note detection algorithm is failing on few cases?

I am using a simple approach to find out the musical note using FFT in python steps involved are:

  1. Reading the sound file(.wave)
  2. Detecting silence in the file(by computing square sum of squared elements of input falling within the window)
  3. Detecting the location of notes using data obtained from (2)
  4. Calculating the frequency of each detected note by using DFT
  5. Matching the calculated frequency to the standard frequencies of notes to identify the note that is being played.

but in a case where the note should come out to be A4/440hz, I am getting a huge variation(2K Hz) is there any fundamental error in my approach?

UPDATE: how can I pass my audio.wav file to this frequency estimator?

the complete python code is here

window_size = 2000    # Size of window to be used for detecting silence
beta = 1   # Silence detection parameter
max_notes = 100    # Maximum number of notes in file, for efficiency
sampling_freq = 44100   # Sampling frequency of audio signal
threshold = 200


 # traversing sound_square array with a fixed window_size
while(i<=len(sound_square)-window_size):
    s = 0.0
    j = 0
    while(j<=window_size):
        s = s + sound_square[i+j]
        j = j + 1   
        # detecting the silence waves
    if s < threshold:
        if(i-k>window_size*4):
            dft = np.array(dft) # applying fourier transform function
            dft = np.fft.fft(sound[k:i])
            dft = np.argsort(dft)

            if(dft[0]>dft[-1] and dft[1]>dft[-1]):
                i_max = dft[-1]
            elif(dft[1]>dft[0] and dft[-1]>dft[0]):
                i_max = dft[0]
            else :  
                i_max = dft[1]
                        # claculating frequency             
            frequency.append((i_max*sampling_freq)/(i-k))
            dft = []
            k = i+1
    i = i + window_size

Upvotes: 2

Views: 881

Answers (2)

Jon Nordby
Jon Nordby

Reputation: 6289

Pitch tracking is implemented in librosa.piptrack https://librosa.github.io/librosa/generated/librosa.core.piptrack.html#librosa.core.piptrack

Upvotes: 0

hotpaw2
hotpaw2

Reputation: 70733

Pitch is not the same as peak magnitude frequency bin of an FFT. Pitch is a human psycho-acoustic phenomena. The pitch sound could have a missing or very weak fundamental (common in some voice, piano and guitar sounds) and/or lots of powerful overtones in its spectrum that overwhelm the pitch frequency (but still be heard as that pitch note by a human). So any FFT peak frequency detector (even including some windowing and interpolation, which your code does not) will not be a robust method of musical pitch estimation. An FFT will also quantize frequency to some bin resolution (perhaps coarser than your requirements) that depends on the FFT (or window) length.

An answer to this stackoverflow question includes a list of some alternate methods of estimating pitch that might produce better results.

Upvotes: 1

Related Questions