Reputation: 115
I wanna do a timeseries clustering task. Let's say we have four data (t1~t4).
t1={1,1,1,1,1,1,1}
t2={10,10,10,10,10,10,10}
t3={100,100,100,100,100,100,100}
t4 = {1,5,9,13,17,21,25}
Here, my intention behind this example is that I want to group t1, t2, t3 together, because its shape is a constant line. However, t4 looks like ascending line so it is supposed to be in other group.
But, If I compute distances between t1 and the others using DTW (python mlpy package), I got result as follows:
t1-t1: 0 (absolutely)
t1-t2: 63
t1-t3: 693
t1-t4: 84
As we can see, distance between t1-t3 is much greater than that between t1-t4. I guess it is because the scale of amplitude of t3 is much greater than the others.
In this situation, is it good way to use min-max normalization (i.e., 0 to 1 normalization) for each timeseries data before adapting DTW? In other words, making t1, t2, t3 to be {0,0,0,0,0,0,0}, and t4 to be {0, 0,17,...., 1}? Then, DTW returns result as I want.
Shortly, I wonder the suitability of normalization task before DTW. I'm newbie to the DTW, sorry for bothering you with the basic question! :)
Upvotes: 3
Views: 3675
Reputation: 86
No, you should do z-normalization.
Zero-One normalization is very sensitive to a single outlier
Source http://www.cs.unm.edu/~mueen/DTW.pdf
Upvotes: 7