Reputation: 595
I needed a java-based feature extraction library and found Sphinx, but do not know how to work with. Basically, I need to convert a wav file into the MEL coefficients. Once I had done that in Matlab, but since I'm not well familiar with Java, I couldn't grasp how to use their code and extract the features.
By the way, if by any chance you knew about another open source library that is able to do that quickly, it would be extremely helpful.
Update: Since I'm going to use that on Android, I found that it might be a better idea to use PocketSphinx for that. (I tried downloading their demo app at but it did not run on my device (Nexus 5), It tries to open an activity but gets closed immediately.) I've also followed these steps, but no gain yet.
It would be wonderful if someone could help me figure out how to set that up. I need to know: 1- Which modules should be used, 2- How can I use the library for my own project? 3- How can I setup the library: which functions should be used and how.
Thanks in advance.
Is there a step by step guide to use
Upvotes: 1
Views: 1422
Reputation: 2507
For sure, it is possible to compute MFCC-features with sphinx4. But wouldn't say that it will be quickly. There is a notion of frontend in sphinx4 which is responsible for processing input data. Typical frontend looks like this:
<component name="liveFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
<propertylist name="pipeline">
<item>dataSource </item>
<item>dataBlocker </item>
<item>speechClassifier </item>
<item>speechMarker </item>
<item>nonSpeechDataFilter </item>
<item>preemphasizer </item>
<item>windower </item>
<item>fft </item>
<item>autoCepstrum </item>
<item>liveCMN </item>
<item>featureExtraction </item>
<item>featureTransform </item>
</propertylist>
</component>
Each element of the frontend reads data from the previous element, somehow process it, and passes to the next one. Here dataSource
accepts raw audio input and autoCepstrum
outputs MFCC-coefficients. Everything else is related to the particular setup of the speech recognizer. Now if you want to use sphinx4 to compute MFCCs, you either should setup similar frontend yourself instantiating and tuning each component separately or can write XML configuration and instantiate frontend using ConfigurationManager.
Upvotes: 3