Reputation: 11
I am trying to implement a Speech Recognition module using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW).
I divide the signal(x(n)) into frames with of 25ms with overlap of 10ms and find the MFCC parameters for each frame. My main doubt is how do i perform DTW in this scenario. Suppose there are M frames, and N(13) MFCC coefficients.
So I have a M x N matrix. Now how am I supposed to compute DTW?
Upvotes: 1
Views: 1308
Reputation: 73
The use of DTW suppose to verify 2 audio sequences in your case. Thus, for the sequence to be verify you will have a matrix M1xN and for the query M2xN. This implies that your cost matrix will have M1xM2.
To construct the cost matrix you have to apply a distance/cost measure between the sequences, as cost(i,j) = your_chosen_multidimension_metric(M1[i,:],M2[j,:])
The resulted cost matrix will be 2D, and you could apply easily DTW.
I made a similar code for DTW based on MFCC. Below is the Python implementation which returs DTW score; x and y are the MFCC matrix of voice sequences, with M1xN and M2xN dimensions:
def my_dtw (x, y):
cost_matrix = cdist(x, y,metric='seuclidean')
m,n = np.shape(cost_matrix)
for i in range(m):
for j in range(n):
if ((i==0) & (j==0)):
cost_matrix[i,j] = cost_matrix[i,j]
elif (i==0):
cost_matrix[i,j] = cost_matrix[i,j] + cost_matrix[i,j-1]
elif (j==0):
cost_matrix[i,j] = cost_matrix[i,j] + cost_matrix[i-1,j]
else:
min_local_dist = cost_matrix[i-1,j]
if min_local_dist > cost_matrix[i,j-1]:
min_local_dist = cost_matrix[i,j-1]
if min_local_dist > cost_matrix[i-1,j-1]:
min_local_dist = cost_matrix[i-1,j-1]
cost_matrix[i,j] = cost_matrix[i,j] + min_local_dist
return cost_matrix[m-1,n-1]
Upvotes: 0
Reputation: 71
The matrix of MxN can be represented as 1D-vector MxN length.
so, you have pattern1
p1[M*N], len=i, 'silence-HHHEEEEELLLLLOOOOOOOO-silence' sound;
then, second
p2[M*N], len=j, like 'HHHHHHEEELLOOOO'
then DTW by manhattan, euclidean, Bray-Curtis, etc distance calculation, you get output 2d matrix, there will be a path with minimum weigth.
Upvotes: 2