swe87
swe87

Reputation: 179

Why 128 mel bands are used in mel spectrograms?

I am using the mel spectrogram function which can be found here:Mel Spectrogram Librosa

I use it as follows:

signal = librosa.feature.melspectrogram(y=waveform, sr=sample_rate, n_fft=512, n_mels=128)

Why is 128 mel bands use? I understand that the mel filterbank is used to simulate the "filterbank" in human ears, that's why it discriminates higher frequencies.

I am designing and implementing a Speech-to-Text with Deep Learning and when I used n_mels=64, it didn't work at all, it only works with n_mels=128.

Could it because I am normalizing it before injecting it to the network? I am using the librosa.utils.normalize function and it normalizes the mel spectrogram between -1 and 1.

I tried to find where to learn or the reasoning, the only paper I found was this one. Here mel bands from 512 to 128 are being used..... Comparison of Time-Frequency Representations forEnvironmental Sound Classification usingConvolutional Neural Networks

Output examples when n_mels=128 enter image description here

Output examples when n_mels=64 enter image description here Thanks.

Upvotes: 4

Views: 7508

Answers (0)

Related Questions