Reputation: 15002
I have some audio files which I want to convert to log mel spectogram. I need the log mel spectogram to be in the shape of (512,512)
. I changed the n_mels to 512, to get the first dimension 512 but I am unable to change the second dimension to 512 for all audios. I tried experimenting with hop_length values by trial and error, in some audio files it work and in the others it doesn't.How do we get log mel spectrogram of specific shape using librosa?
path = "path/to/my/file"
scale, sr = librosa.load(path)
mel_spectrogram = librosa.feature.melspectrogram(scale, sr, n_fft=2048, hop_length=512, n_mels=512, fmax=8000)
log_mel_spectrogram = librosa.power_to_db(mel_spectrogram)
librosa.display.specshow(log_mel_spectrogram, x_axis="time", y_axis="mel", sr=sr) ```
Upvotes: 0
Views: 216
Reputation: 6299
The second dimension of a spectrogram is time. So if your audio clips have variable duration, this dimension.
The standard approach is to divide your audio clip/stream into fixed-length windows of time. For example, have a window length of 1 second. Then you can set the hop_length such that this becomes 512 bins. Note that the length of windows as well as length of each frame is an important parameter and influences how easy it is to analyze. Conventional choice for the length of one frame for sound data would be 10-40 ms, corresponding to 100-25 frames per second. For data analysis, one usually computes windows with some overlap, say 50% or even 90%.
Upvotes: 0