Reputation: 179
I have come across 2 different ways of generating log-mel spectrograms for audio files using librosa and I don't know why they differ in the final output, which one is "correct" or how different is one from the other.
#1
path = "path/to/my/file"
scale, sr = librosa.load(path)
mel_spectrogram = librosa.feature.melspectrogram(scale, sr, n_fft=2048, hop_length=512, n_mels=10, fmax=8000)
log_mel_spectrogram = librosa.power_to_db(mel_spectrogram)
librosa.display.specshow(log_mel_spectrogram, x_axis="time", y_axis="mel", sr=sr)
#2
path = "path/to/my/file"
scale, sr = librosa.load(path)
X = librosa.stft(scale)
Xdb = librosa.amplitude_to_db(abs(X))
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
The respective images are:
** EDIT ** Now that I specify the number of mel bins to be = 64, I obtain the spectrogram as below:
If I want to process many such spectrograms, should I trim off the bold blue portion above as it is common to all? What does the bold, dark region represent? Is it advisable to use fmax parameter to trim it?
Upvotes: 2
Views: 7778
Reputation: 6259
The second spectrogram is not a mel-spectrogram, but a STFT (sometimes called "linear") spectrogram. It has all the frequency bands from the FFT, (n_fft/2)+1 bands, 1025 for n_fft=2048. Where-as the mel-spectrogram has mel filters applied which reduces the number of bands to n_mels (typically 32-128), in your example set to 10.
Upvotes: 2