Reputation: 11
i'm currently carrying out a project related to speech recognition where mouse events like right click, left click, double click .. etc will be given as voice command. therefore as the first step my supervisor told me to extract the features of each voice command using Mel frequency cepstral coefficient and store those extracted features in a text file using LIBSVM format. i have implemented MFCC using some reference in the internet. but i'm not sure where this is correct. i'm not sure about the out put of the MFCC. my program gives something like this when i say 'Right'
e.g -15.211534 8.230449 2.150475 4.000576 -0.037819 -1.083192 0.102314 0.232710 -0.813507 -0.349909 0.850858
Upvotes: 1
Views: 3589
Reputation: 2243
When trying to analyze speech, most contemporary solutions use a series of MFCC coefficients, not just a single one. In general, getting MFCC goes like this:
complexSpectrum = fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = melFilterBank(powerSpectrum)
logSpectrum = log(filteredSpectrum)
dctSpectrum = dct(logSpectrum)
and you do this for on a 30ms window, sliding along the signal in step of 10ms.
As for the precise implementation, you can learn from the code in Spro written in C (sfbcep util) or in Sphinx, if you find Java more familiar.
Upvotes: 1