Compare two spoken words with MFCC and DTW using Aquila library

Question

I am trying to find the similarity between spoken words using aquila library. my current approach is as follows.
1) First i break down the spoken word into smaller frames.
2) then apply MFCC for each frame and store the result in a vector.
3) finally calculate the distance using DTW.

this is the code i am using.

int frame_size = 1024;

Aquila::WaveFile waveIn0("start_1.wav");
Aquila::FramesCollection frameCollection0(waveIn0, frame_size);
vector> dtwdt0;
Aquila::Mfcc mfcc0(frame_size);
for(int i = 0; i < frameCollection0.count() ; i++)
{
    Aquila::Frame frame = frameCollection0.frame(i);
    vector mfccValues = mfcc0.calculate(frame);
    dtwdt0.push_back(mfccValues);
}

Aquila::WaveFile waveIn1("start_2.wav");
Aquila::FramesCollection frameCollection1(waveIn1, frame_size);
vector> dtwdt1;
Aquila::Mfcc mfcc1(frame_size);
for(int i = 0; i < frameCollection1.count(); i++)
{
    Aquila::Frame frame = frameCollection1.frame(i);
    vector mfccValues = mfcc1.calculate(frame);
    dtwdt1.push_back(mfccValues);
}

Aquila::Dtw dtw(Aquila::euclideanDistance, Aquila::Dtw::PassType::Diagonals);
double distance_1 = dtw.getDistance(dtwdt0, dtwdt1);
cout << "Distance : " << distance_1 << endl;

It works fine except it is not accurate enough. sometimes it shows less distance between spoken words 'start' and 'stop' rather than two spoken 'start'.

is my code correct? how to improve the program so i can get more accurate result? any help will be appreciated.

Thanks.

Compare two spoken words with MFCC and DTW using Aquila library

Answers (1)

Related Questions