Lif
Lif

Reputation: 35

Confused about MFCC processing

So I extracted an audio file feature with MFCC using Librosa library in python. This is what code looks like:

signal, sample_rate = librosa.load('../audio_train/down/00176480_nohash_0.wav', sr=22050)
mfcc = librosa.feature.mfcc(signal, sr=sample_rate, n_mfcc=13)
np.mean(mfcc.T, axis=0)

My question is, why do we have to transpose and get the mean value of the MFCC?

Upvotes: 0

Views: 640

Answers (1)

Thilina Dissanayake
Thilina Dissanayake

Reputation: 333

Taking the mean of the transposed MFCC shows the mean energy of mel-coefficients over time. This sometimes helps in better visualization of how the characteristic energy differences are distributed along the time axis.

As an example, the (a) of the following figure shows the 21-order mel-spectrum of some noises and (b) shows the mean energy of each time frame. This visualization helps to distinguish between the human voice recorded in between 1.25s and 1.5s. enter image description here

As mentioned in the comments, it is not compulsory and this is totally based on your case.

The figure was taken from the following publication.

Bi, Chongguang, et al. "Familylog: A mobile system for monitoring family mealtime activities." 2017 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 2017.

Upvotes: 1

Related Questions