user2421640
user2421640

Reputation: 11

Mel Frequency cepstral coefficient - Speech feature extraction

i'm currently carrying out a project related to speech recognition where mouse events like right click, left click, double click .. etc will be given as voice command. therefore as the first step my supervisor told me to extract the features of each voice command using Mel frequency cepstral coefficient and store those extracted features in a text file using LIBSVM format. i have implemented MFCC using some reference in the internet. but i'm not sure where this is correct. i'm not sure about the out put of the MFCC. my program gives something like this when i say 'Right'

e.g -15.211534  8.230449    2.150475    4.000576    -0.037819   -1.083192   0.102314    0.232710    -0.813507   -0.349909   0.850858
  1. Can some one explain what king of out put should get from MFCC
  2. How to store extracted features from MFCC in LIBSVM format.
  3. And can some one help me to find the correct mathlab implementation of MFCC for my problem.

Upvotes: 1

Views: 3589

Answers (1)

hruske
hruske

Reputation: 2243

When trying to analyze speech, most contemporary solutions use a series of MFCC coefficients, not just a single one. In general, getting MFCC goes like this:

complexSpectrum = fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = melFilterBank(powerSpectrum)
logSpectrum = log(filteredSpectrum)
dctSpectrum = dct(logSpectrum)

and you do this for on a 30ms window, sliding along the signal in step of 10ms.

As for the precise implementation, you can learn from the code in Spro written in C (sfbcep util) or in Sphinx, if you find Java more familiar.

Upvotes: 1

Related Questions