VITTHAL BHANDARI
VITTHAL BHANDARI

Reputation: 179

log mel spectrogram using librosa

I have come across 2 different ways of generating log-mel spectrograms for audio files using librosa and I don't know why they differ in the final output, which one is "correct" or how different is one from the other.

#1

path = "path/to/my/file"
scale, sr = librosa.load(path)
mel_spectrogram = librosa.feature.melspectrogram(scale, sr, n_fft=2048, hop_length=512, n_mels=10, fmax=8000)
log_mel_spectrogram = librosa.power_to_db(mel_spectrogram)
librosa.display.specshow(log_mel_spectrogram, x_axis="time", y_axis="mel", sr=sr)

#2

path = "path/to/my/file"
scale, sr = librosa.load(path)
X = librosa.stft(scale)
Xdb = librosa.amplitude_to_db(abs(X))
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')

The respective images are:

#1#2

** EDIT ** Now that I specify the number of mel bins to be = 64, I obtain the spectrogram as below: enter image description here

If I want to process many such spectrograms, should I trim off the bold blue portion above as it is common to all? What does the bold, dark region represent? Is it advisable to use fmax parameter to trim it?

Upvotes: 2

Views: 7778

Answers (1)

Jon Nordby
Jon Nordby

Reputation: 6259

The second spectrogram is not a mel-spectrogram, but a STFT (sometimes called "linear") spectrogram. It has all the frequency bands from the FFT, (n_fft/2)+1 bands, 1025 for n_fft=2048. Where-as the mel-spectrogram has mel filters applied which reduces the number of bands to n_mels (typically 32-128), in your example set to 10.

Upvotes: 2

Related Questions