ilcord
ilcord

Reputation: 414

Fast way to cluster time series data in R

I'm trying to cluster time-series data: I have about 16000 time-series vectors, each vector is ~1500 samples long.

I tried using the dtw package:

d = dist(x = time_series, method = "DTW")
hclust(d)

however the distance matrix calculation didn't finish running throughout the whole weekend.

I'm looking for a faster way since my data set will be much larger.

Upvotes: 0

Views: 746

Answers (1)

user2313186
user2313186

Reputation: 254

Your data is on length 1500. Suppose it is oversampled..

If you downsample it 1 in 2, DTW will be 4 times faster. If you downsample it 1 in 4, DTW will be 16 times faster. If you downsample it 1 in 10, DTW will be 100 times faster.

This might be a good starting point.

Are you using cDTW or DTW? The former is significant faster, and can often be more accurate.

A paper in SIGKDD this week has a faster way to cluster DTW by using upper and lower bounds [a].


However, your matrix is of size (16000 * 15999)/2.

So if you have two days: two days / (16000 * 15999)/2 = 337 microseconds

So you need to do each comparison in 337 microseconds, that is not a lot of time. This will be difficult..., but it is doable with effort. If you get stuck, email me (I am the last author of [a])

[a] Nurjahan Begum, Liudmila Ulanova, Jun Wang, Eamonn Keogh (2015). Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy SIGKDD 2015

Upvotes: 3

Related Questions