Reputation: 679
I just started reading about DTW, and decided to try two Python packages, fastdtw and dtaidistance.
Consider the case of a multiclass timeseries classification problem (Classes are 0, 1, 3 and 4). Samples for classes 1, 3 and 4 are generated based on the Classes 0 samples, like this:
import numpy as np
from fastdtw import fastdtw
from dtaidistance import dtw
np.random.seed(42)
# Original Class 0 samples (10 samples with 48 half-hourly measurements each)
class_0_samples = np.random.rand(10, 48)
# Generate Class 1 samples (multiply each sample by a random value between [0, 0.8])
class_1_samples = class_0_samples * np.random.uniform(0, 0.8, size=(10, 1))
# Generate Class 3 samples (multiply each half-hourly measurement by a different random value between [0, 0.8])
class_3_samples = class_0_samples * np.random.uniform(0, 0.8, size=(10, 48))
# Generate Class 4 samples (multiply specific columns by a random value between [0, 0.8])
class_4_samples = class_0_samples.copy()
start_cols = np.random.randint(7, 15, size=(10,))
for i in range(10):
start_col = start_cols[i]
class_4_samples[i, start_col:start_col+4] = class_4_samples[i, start_col:start_col+4]*np.random.uniform(0, 0.8)
Now, i have tried to determine the DTW distance from each sample of Class 0 to each sample of Classes 1, 3 and 4. Using fastdtw
, i wrote this code:
# Calculate DTW distances between Class 0 samples and original Class 0 samples
fastdtw_distances_class_0 = []
fastdtw_distances_class_1 = []
fastdtw_distances_class_3 = []
fastdtw_distances_class_4 = []
for i in range(10):
distance0, _ = fastdtw(class_0_samples[i].reshape(1, -1), class_0_samples[i].reshape(1, -1), dist=euclidean)
fastdtw_distances_class_0.append(distance0)
distance1, _ = fastdtw(class_0_samples[i].reshape(1, -1), class_1_samples[i].reshape(1, -1), dist=euclidean)
fastdtw_distances_class_1.append(distance1)
distance3, _ = fastdtw(class_0_samples[i].reshape(1, -1), class_3_samples[i].reshape(1, -1), dist=euclidean)
fastdtw_distances_class_3.append(distance3)
distance4, _ = fastdtw(class_0_samples[i].reshape(1, -1), class_4_samples[i].reshape(1, -1), dist=euclidean)
fastdtw_distances_class_4.append(distance4)
# Convert distances to a numpy array
fastdtw_distances_class_0 = np.array(fastdtw_distances_class_0).reshape(-1, 1)
fastdtw_distances_class_1 = np.array(fastdtw_distances_class_1).reshape(-1, 1)
fastdtw_distances_class_3 = np.array(fastdtw_distances_class_3).reshape(-1, 1)
fastdtw_distances_class_4 = np.array(fastdtw_distances_class_4).reshape(-1, 1)
And for dtaidistance
, i wrote this:
# Calculate DTW distances between Class 0 samples and original Class 0 samples
dtaidtw_distances_class_0 = []
dtaidtw_distances_class_1 = []
dtaidtw_distances_class_3 = []
dtaidtw_distances_class_4 = []
for i in range(10):
distance0 = dtw.distance_fast(class_0_samples[i], class_0_samples[i])
dtaidtw_distances_class_0.append(distance0)
distance1 = dtw.distance_fast(class_0_samples[i], class_1_samples[i])
dtaidtw_distances_class_1.append(distance1)
distance3 = dtw.distance_fast(class_0_samples[i], class_3_samples[i])
dtaidtw_distances_class_3.append(distance3)
distance4 = dtw.distance_fast(class_0_samples[i], class_4_samples[i])
dtaidtw_distances_class_4.append(distance4)
# Convert distances to a numpy array
dtaidtw_distances_class_0 = np.array(dtaidtw_distances_class_0).reshape(-1, 1)
dtaidtw_distances_class_1 = np.array(dtaidtw_distances_class_1).reshape(-1, 1)
dtaidtw_distances_class_3 = np.array(dtaidtw_distances_class_3).reshape(-1, 1)
dtaidtw_distances_class_4 = np.array(dtaidtw_distances_class_4).reshape(-1, 1)
If you run all that, you can see that some distances are the same, but some aren't. For example, if we check fastdtw
and dtaidistance
distances, between the sample 0 of Class 0 and Class 4 we can see they are exactly the same:
fastdtw_distances_class_4[0]
Out[12]: array([0.35720446])
dtaidtw_distances_class_4[0]
Out[13]: array([0.35720446])
But for some other samples, they are differente, for example, Sample 1:
fastdtw_distances_class_4[1]
Out[15]: array([0.57095312])
dtaidtw_distances_class_4[1]
Out[16]: array([0.48818216])
And for some other cases, they are significantly different, for example, any sample between Class 0 and Class 3.
fastdtw_distances_class_3[8]
Out[17]: array([3.27973515])
dtaidtw_distances_class_3[8]
Out[18]: array([2.11034766])
What could be the cause of this? There is something wrong with my code? Or the implementation of the DTW algorithm in those packages are really different from each other? How should I chose one or another in this case?
Upvotes: 1
Views: 184