havakok
havakok

Reputation: 1247

Difference between output of python librosa.core.stft() and matlab spectrogram(x)

I am converting a Python code to MATLAB. The Python code, uses the following command:

stft_ch = librosa.core.stft(audio_input[:, ch_cnt], n_fft=self._nfft, 
                            hop_length=self._hop_len, win_length=self._win_len, 
                            window='hann')

Where audio_input.shape=(2880000, 4), self._nfft=2048, self._hop_len=960 and self._win_len=1920.

When converting to MATLAB I used:

stft_ch = spectrogram(audio_input(:, ch_cnt), hann(win_len), win_len-hop_len, nfft);

where I verified size(audio_input)=2880000, 4, win_len=1920, win_len-hop_len=960 and nfft=2048.

I am getting an output from MATLAB with size(stft_ch)=1025, 2999 where Python shows stft_ch.shape=(1025, 3001). The size 2999 in the MATLAB output is clear and feats the documentation where k = ⌊(Nx – noverlap)/(length(window) – noverlap)⌋ if window is a vector.

However, I could not find in the Python documentation how is the length of t set.

Why is there a difference between sizes? Is my conversion good?

Is there a Python function which produces an output more similar to MATLAB's spectrogram() so that I can get the complex output with the same size?

Upvotes: 3

Views: 1610

Answers (2)

RG45
RG45

Reputation: 11

There is another issue with this topic. In MATLAB, stft function is described using

stft(x,d,'Window',win,'OverlapLength',overlap,'FFTLength',nfft);

where we specify a Window function (default value - Hann(128,'periodic')), which slides over the signal length considering the OverlapLength values.

In Python, librosa.stft is defined as

stftMatrix_complex = librosa.stft(data_frame, n_fft=n_fft,hop_length=hop_length_fft,win_length=win_len,window='hann',dtype=np.float64)

where it is mentioned that the value of win_len defaults to n_fft.

However, by changing the value of win_length, there is no effect on the STFT computation. The function takes the length of the data_frame and computes the temporal DFT bins by taking only hop_length.

I could not make out why this is happening.

The default value of center=True Defaults to True, which simplifies the alignment of D onto a time grid by means of librosa.frames_to_samples.

The number of columns (time windows) in the STFT matrix is more in case of Python librosa.stft compared to MATLAB stft function.

If any one can get the answer, pls clarify. Thanks,.

Upvotes: 1

havakok
havakok

Reputation: 1247

I have found the answer myself.

The MATLAB function spectrogram() outputs a vector of times which corresponds to the middle of each window while omitting the last window. For example, a 10 samples length signal with a 3 sample window and 1 sample overlap, will result in the following 4 windows:

1:3,3:5,5:7,7:9, where m:n represents a window including samples from m to n including the nth sample.

The centers for the windows would, therefore, be: 2,4,6,8. Note that the 10th sample is not included.

It seems that MATLAB requires the maximal number_of_windows subjogated to (number_of_windows-1)*hop_length+window_size<=number_of_samples.

On the python version liberosa.core.stft() on the other way, t is the time of the first sample for each frame and the frames covers more than the input signal. for example, a 10 samples length signal with a 3 sample window and 2 sample hops (hops and not overlap), will result in the following 4 windows:

1:3,3:5,5:7,7:9,9:11, where m:n represents a window including samples from m to n including the nth sample.

The beginnings for the windows would, therefore, be: 1,3,5,7,9. Note that the 11th non-existing sample is included.

It seems that liberosa requires the minimal number_of_windows subjogated to number_of_windows*hop_length>number_of_samples.

In my case:

(2999-1)960+1920=2880000<=2880000 for MATLAB. 3001960=2880960>2880000 while 30000*960=2880000 !> 2880000 in python.

Note that the times can be centered in Python by setting center=True flag.

This is the best explanation I could find.

Upvotes: 5

Related Questions