Reputation: 1247
I am converting a Python code to MATLAB. The Python code, uses the following command:
stft_ch = librosa.core.stft(audio_input[:, ch_cnt], n_fft=self._nfft,
hop_length=self._hop_len, win_length=self._win_len,
window='hann')
Where audio_input.shape=(2880000, 4)
, self._nfft=2048
, self._hop_len=960
and self._win_len=1920
.
When converting to MATLAB I used:
stft_ch = spectrogram(audio_input(:, ch_cnt), hann(win_len), win_len-hop_len, nfft);
where I verified size(audio_input)=2880000, 4
, win_len=1920
, win_len-hop_len=960
and nfft=2048
.
I am getting an output from MATLAB with size(stft_ch)=1025, 2999
where Python shows stft_ch.shape=(1025, 3001)
. The size 2999
in the MATLAB output is clear and feats the documentation where k = ⌊(Nx – noverlap)/(length(window) – noverlap)⌋
if window is a vector.
However, I could not find in the Python documentation how is the length of t
set.
Why is there a difference between sizes? Is my conversion good?
Is there a Python function which produces an output more similar to MATLAB's spectrogram()
so that I can get the complex output with the same size?
Upvotes: 3
Views: 1610
Reputation: 11
There is another issue with this topic.
In MATLAB, stft
function is described using
stft(x,d,'Window',win,'OverlapLength',overlap,'FFTLength',nfft);
where we specify a Window function (default value - Hann(128,'periodic')
), which slides over the signal length considering the OverlapLength
values.
In Python, librosa.stft
is defined as
stftMatrix_complex = librosa.stft(data_frame, n_fft=n_fft,hop_length=hop_length_fft,win_length=win_len,window='hann',dtype=np.float64)
where it is mentioned that the value of win_len
defaults to n_fft
.
However, by changing the value of win_length
, there is no effect on the STFT computation. The function takes the length of the data_frame
and computes the temporal DFT bins by taking only hop_length
.
I could not make out why this is happening.
The default value of center=True
Defaults to True
, which simplifies the alignment of D
onto a time grid by means of librosa.frames_to_samples
.
The number of columns (time windows) in the STFT matrix is more in case of Python librosa.stft
compared to MATLAB stft
function.
If any one can get the answer, pls clarify. Thanks,.
Upvotes: 1
Reputation: 1247
I have found the answer myself.
The MATLAB function spectrogram()
outputs a vector of times which corresponds to the middle of each window while omitting the last window. For example, a 10 samples length signal with a 3 sample window and 1 sample overlap, will result in the following 4 windows:
1:3
,3:5
,5:7
,7:9
, where m:n
represents a window including samples from m
to n
including the n
th sample.
The centers for the windows would, therefore, be: 2,4,6,8
. Note that the 10th sample is not included.
It seems that MATLAB requires the maximal number_of_windows
subjogated to (number_of_windows-1)*hop_length+window_size<=number_of_samples
.
On the python version liberosa.core.stft()
on the other way, t is the time of the first sample for each frame and the frames covers more than the input signal. for example, a 10 samples length signal with a 3 sample window and 2 sample hops (hops and not overlap), will result in the following 4 windows:
1:3
,3:5
,5:7
,7:9
,9:11
, where m:n
represents a window including samples from m
to n
including the n
th sample.
The beginnings for the windows would, therefore, be: 1,3,5,7,9
. Note that the 11th non-existing sample is included.
It seems that liberosa requires the minimal number_of_windows
subjogated to number_of_windows*hop_length>number_of_samples
.
In my case:
(2999-1)960+1920=2880000<=2880000 for MATLAB. 3001960=2880960>2880000 while 30000*960=2880000 !> 2880000 in python.
Note that the times can be centered in Python by setting center=True
flag.
This is the best explanation I could find.
Upvotes: 5