Reputation: 255
I try to create speaker identification system on Android. Currently I'm using libxtract to calculate MFCC vector from frames and libsvm for classify.
Do you have any idea how to use libxtract or other small C, C++ library that I can compile under NDK to detect voice (VAD Voice Activity Detection) in frames?
Upvotes: 3
Views: 3084
Reputation: 2661
How about LibVAD? www.libvad.com
Seems like that does exactly what you're describing.
Disclosure: I'm the developer behind LibVAD
Upvotes: 0
Reputation: 1313
The Voicebox toolkit has a good VAD implementation, using a few of the techniques that Jamie describes. You can find it in vadsohn.m which implements "A Statistical Model-Based Voice Activity Detection" (1999) - by Sohn, et al.
You can also find some implementations, say of the G729 codec VAD (used in VOIP applications) on github. For example this masters thesis.
These implementations are in MATLAB/Octave, but can be ported to C/C++ with a bit of work. Good luck!
Upvotes: 1
Reputation: 5306
Robust VAD is a non-trivial problem, and there are many approaches.
The approach you take depends on factors such as:
A simple approach might involve taking a "bag of features" (e.g. f0, noisiness, magnitudes of first 10 partials) post-noise reduction for each audio frame, and training a machine learning algorithm (SVM would suffice) with a wide selection of voice and non-voice exemplars.
However, it is probably best not to treat VAD a a simple framewise audio classification problem, but rather to take time varying aspects of the audio into account. This will give you a better estimate of where speech segments begin and end. For this you could use an envelope follower or spectral flux. You could set a high and low threshold on these envelope values, and use these (for example) to control a gate on the audio stream.
Upvotes: 1