EmJ
EmJ

Reputation: 4618

How to get distance matrix using dynamic time warping?

I have 6 time series values as follows.

import numpy as np
series = np.array([
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1],
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1]])

Suppose, I want to get the distance matrix of dynamic time warping to perform a clustering. I used dtaidistance library for that as follows.

from dtaidistance import dtw
ds = dtw.distance_matrix_fast(series)

The output I got was as follows.

array([[       inf, 1.41421356, 2.23606798, 0.        , 1.41421356, 2.23606798],
       [       inf,        inf, 1.73205081, 1.41421356, 0.        , 1.73205081],
       [       inf,        inf,        inf, 2.23606798, 1.73205081, 0.        ],
       [       inf,        inf,        inf,        inf, 1.41421356, 2.23606798],
       [       inf,        inf,        inf,        inf,        inf, 1.73205081],
       [       inf,        inf,        inf,        inf,        inf,        inf]])

It seems to me that the output I get is wrong. For instance, as I understand the diagonal values of the ouput should be 0 (since they are ideal matches).

I want to know where I am making things wrong and how to fix it. I am also happy to get answers using other python libraries too.

I am happy to provide more details if needed.

Upvotes: 4

Views: 3075

Answers (2)

Stef
Stef

Reputation: 30609

Everything is correct. As per the docs:

The result is stored in a matrix representation. Since only the upper triangular matrix is required this representation uses more memory then necessary.

All diagonal elements are 0 the the lower triangular matrix is the the same as the upper triagular matrix mirrored at the diagonal. As all these value can be deducted from the upper triangular matrix they aren't shown in the output.
You can even use the compact=True argument to only get the values from the upper diagonal matrix concatenated into a 1D array.

You can convert the result to a full matrix like this:

ds[ds==np.inf] = 0
ds += dt.T

Upvotes: 7

Arno C
Arno C

Reputation: 490

In dtw.py the default value for elements of the distance matrix are specified to be np.inf. As the matrix returns the pairwise distance between different sequences, this will not be filled in in the matrix, resulting in np.inf values.

Try running with dtw.distance_matrix_fast(series, compact=True) to prevent seeing this filler information.

Upvotes: 2

Related Questions