Confused about MFCC processing

Question

So I extracted an audio file feature with MFCC using Librosa library in python. This is what code looks like:

signal, sample_rate = librosa.load('../audio_train/down/00176480_nohash_0.wav', sr=22050)
mfcc = librosa.feature.mfcc(signal, sr=sample_rate, n_mfcc=13)
np.mean(mfcc.T, axis=0)

My question is, why do we have to transpose and get the mean value of the MFCC?

Thilina Dissanayake · Accepted Answer

Taking the mean of the transposed MFCC shows the mean energy of mel-coefficients over time. This sometimes helps in better visualization of how the characteristic energy differences are distributed along the time axis.

As an example, the (a) of the following figure shows the 21-order mel-spectrum of some noises and (b) shows the mean energy of each time frame. This visualization helps to distinguish between the human voice recorded in between 1.25s and 1.5s.

As mentioned in the comments, it is not compulsory and this is totally based on your case.

The figure was taken from the following publication.

Bi, Chongguang, et al. "Familylog: A mobile system for monitoring family mealtime activities." 2017 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 2017.

Confused about MFCC processing

Answers (1)

Related Questions