pacomet
pacomet

Reputation: 5141

Clustering time series in R with dtwclust

I am trying my first attempt on time series clustering and need some help. I have read about tsclust and dtwclust packages for time series clustering and decided to try dtwclust.

My data consist of temperature daily time series at different locations (one single value per day). I would like to group the different locations in spatial clusters from its temperature series. My very first attempt has been (just copied an example with options and put my data, temp.max3)

library(dtwclust)

hc<- tsclust(temp.max3, type = "h", k = 20L,
             preproc = zscore, seed = 899,
             distance = "sbd", centroid = shape_extraction,
             control = hierarchical_control(method = "average"))

But this gave me this error message

Error in stats::hclust(stats::as.dist(distmat), method, members = dots$members) : NA/NaN/Inf in foreign function call (arg 11)

I had to previously remove all NA present in any series, resulting temp.max3 dataframe does not contain any NA value.

summary(temp.max3)
      8025           8400A            8416            8455      
 Min.   : 6.40   Min.   : 4.60   Min.   : 6.00   Min.   : 4.00  
 1st Qu.:18.80   1st Qu.:17.40   1st Qu.:18.20   1st Qu.:19.00  
 Median :23.20   Median :22.00   Median :22.60   Median :24.00  
 Mean   :23.34   Mean   :22.23   Mean   :22.71   Mean   :23.67  
 3rd Qu.:28.20   3rd Qu.:27.40   3rd Qu.:27.40   3rd Qu.:29.00  
 Max.   :41.40   Max.   :40.60   Max.   :43.00   Max.   :42.00

Data looks like

head(temp.max3)
      8025 8400A 8416 8455
13127 16.0  14.0 13.5   14
13128 17.8  15.6 17.4   20
13129 18.2  15.2 19.2   18
13130 17.2  15.0 17.6   19
13131 17.0  13.8 15.6   17
13132 21.0  14.0 18.2   19

where 8025, 8400A, 8416 and 8455 are the station codes (just four by now but will extend to 120 for the final analysis). Data can be found on this dropbox link https://www.dropbox.com/s/xru4qnz8grhbxuo/data.csv?dl=0

Any idea, link to information or example will be greatly appreciated, thanks in advance

Upvotes: 0

Views: 3495

Answers (1)

pacomet
pacomet

Reputation: 5141

Thanks to the comment of Alexis the error message disappeared and the script run fine.

library(dtwclust)

temp.max4<-t(temp.max3)

hc<- tsclust(temp.max4, type = "h", k = 2L,
             preproc = zscore, seed = 899,
             distance = "sbd", centroid = shape_extraction,
             control = hierarchical_control(method = "average"))

with this output

enter image description here

Alexis, I'm sorry I can not accept the comment as the solution.

Upvotes: 2

Related Questions