Need explanation how specgram function work in python (matplotlib - MATLAB compatible functions)

Question

I'm working on converting my code from python to objective c. Inside matplotlib.mlab.specgram function I see 3 important functions before fft :

 result = stride_windows(x, NFFT, noverlap, axis=0)
 result = detrend(result, detrend_func, axis=0)
 result, windowVals = apply_window(result, window, axis=0,
                                      return_window=True)
 result = np.fft.fft(result, n=pad_to, axis=0)[:numFreqs, :]

I tried to debug to understand purpose of each. For example I have array of input:

x = [1,2,3,4,5,6,7,8,9,10,11,12]

After first function stride_windows (this one to prevent leakage?), if NFFT = 4, noverlap = 2 then:

x = [ [1,3,5,7,9],
      [2,4,6,8,10],
      [3,5,7,9,11],
      [4,6,8,10,12] 
    ]

After detrend nothing changes (I understand of detrend before fft)

Inside apply_window (I don't understand this step):

    xshape = list(x.shape) 
    xshapetarg = xshape.pop(axis) // =4
    windowVals = window(np.ones(xshapetarg, dtype=x.dtype))
    //result of 4 elements [0.0, 0.75, 0.75, 0.0]
    xshapeother = xshape.pop() // =5
    otheraxis = (axis+1) % 2  // =1
    windowValsRep = stride_repeat(windowVals, xshapeother, axis=otheraxis)
    // result windowValsRep = [
                                [ 0. ,0. ,0. ,0. ,0. ,],
                                [0.75, 0.75, 0.75, 0.75, 
                                [0.75, 0.75, 0.75, 0.75, 
                                [ 0. ,0. ,0. ,0. ,0. ,]
                              ]

then multiply it with x

windowValsRep * x

Now

 x =    [ 
          [ 0.  , 0.   , 0.   , 0.   , 0.   ],
          [ 1.5 , 3    , 4.5  , 6.   , 7.5  ],
          [ 2.25, 3.75 , 5.25 , 6.75 , 8.25 ],
          [ 0.  , 0.   , 0.   , 0.   , 0.   ] 
        ]

And then final is fft, as I know fft only need a single array but here it processes 2 dimension array. Why ?

result = np.fft.fft(x, n=pad_to, axis=0)[:numFreqs, :]

Could anyone explain for me step by step why data need to be processed like this before fft ?

Thanks,

TheBlackCat · Accepted Answer

Spectrograms and FFTs are not the same thing. The purpose of a spectogram is to take the FFT of small, equal-sized time chunks. This produces a 2D fourier transform where the X axis is the start time of the time chunk and the Y axis is the energy (or power, etc.) in each frequency in that time chunk. This allows you to see how the frequency components change over time.

This is explained in the documentation for the specgram function:

Data are split into NFFT length segments and the spectrum of each section is computed. The windowing function window is applied to each segment, and the amount of overlap of each segment is specified with noverlap.

As for the individual functions, a lot of what you are asking is described in the documentation for reach function, but I will try to explain in a bit more detail.

The purpose of stride_windows, as described in the documentation, is to convert the 1D array of data into a 2D array of successive time chunks. These are the time chunks that will have their FFT calculated in the final spectrogram. In your case they are length-4 (NFFT=4) time chunks (notice the 4 elements per column). Because you set noverlap=2, the last 2 elements of each column are the same as the first 2 elements of the next column (that is what the overlap means). It is called "stride" because it uses a trick regarding the internal storage of numpy arrays to allow it to create an array with the overlapping windows without taking any additional memory.

The detrend function, as its name implies and as is described in its documentation, removes the trend from a signal. By default it uses the mean, which as the detrend_mean documentation describes, removes the mean (DC offset) of the signal.

The apply_window function does exactly what its name implies, and what its documentation says: it applies a window function to each of the time chunks. This is needed because suddenly cutting of the signal at the beginning and end of the time chunks causes large bursts of broadband energy called transients that will mess up the spectrogram. Windowing the signal reduces those transients. By default the spectrogram function uses the hanning window. This attenuates the beginning and end of each time chunk.

The FFT isn't really 2D. The numpy FFT function allows you to specify an axis to take an FFT over. So in this case, we have a 2D array, and we take the FFT of each column of that array. It is much cleaner and a little faster to do this in one step rather than manually looping over each column.

Need explanation how specgram function work in python (matplotlib - MATLAB compatible functions)

Answers (1)

Related Questions