Haroon Lone
Haroon Lone

Reputation: 2949

Reduce run time for DTW distance function

I want to compute DTW distance between columns of a data matrix. But current implementation takes terribly long time. Is there any other implementation of dtw which takes less time?

Here is the dummy data:

df <- data.frame(d1= rnorm(1500,10,5),d2= rnorm(1500,130,10),d3= rnorm(1500,200,10),d4= rnorm(1500,120,15),d5= rnorm(1500,700,25),d6= rnorm(1500,6,2),d7= rnorm(1500,760,15),d8= rnorm(1500,3000,08),d9= rnorm(1500,490,15),d10= rnorm(1500,321,21))

This function returns distance matrix using DTWDistance() function:

compute_dtw_distance_matrix  <- function(data_mat){
  library(TSdist) # for DTWDistance function
  cols = dim(data_mat)[2] # no. of columns or features
  dis_mat = matrix(0,nrow=cols,ncol=cols) # create result matrix
  # Here, I will compute only lower triangular matrix, later I will copy values to full matrix.
  # compute only lower traingular matrix
  for(row in 1:cols){
    ref_col = data_mat[,row]
    for(col in 1:row){
      comp_col = data_mat[,col]
      dis_mat[row,col] = DTWDistance(ref_col, comp_col)
    }
  }
  # convert lower_triangular to full_symmetric matrix
  for(i in 1:NROW(dis_mat)){
    for(j in 1:NCOL(dis_mat)){
      dis_mat[i,j] = dis_mat[j,i] 
    }
  }
  colnames(dis_mat) <- colnames(data_mat)
  row.names(dis_mat) <- colnames(data_mat)
  return(dis_mat)
}

Here are the running time statistic of this function on my machine:

 system.time(compute_dtw_distance_matrix(df))
       user  system elapsed 
     21.500   3.049  24.723 

Is it possible to reduce the running time of this function?

Upvotes: 0

Views: 674

Answers (2)

BajajG
BajajG

Reputation: 2174

I know this is an old question but I was looking for ways to speed up the distance matrix computation in R. I came across RcppParallel package which can be used with several distance functions for computing the distances. Details can be found at https://cran.r-project.org/web/packages/dtwclust/vignettes/parallelization-considerations.html

Upvotes: 1

Aeck
Aeck

Reputation: 553

You could use the parallelDist package which supports the parallel calculation of multi-dimensional Dynamic Time Warping distances.

The parDist function currently takes a list of matrices as an input argument for processing multi-dimensional data sets.

matrices.list <- lapply(as.list(df), function(x) t(as.matrix(x)))

parDist produces the same output in ~ 0.5s with 8 threads compared to 18.44s with the compute_dtw_distance_matrix function, using the following arguments:

res1 <- compute_dtw_distance_matrix(df)
res2 <- parDist(matrices.list, method = "dtw", step.pattern="symmetric2", window.type="none", upper = T, diag = T, threads = 8)
all.equal(as.matrix(res1), as.matrix(res2))

Here is a microbenchmark which includes different number of threads.

    expr        min         lq
                                                                                                  compute_dtw_distance_matrix(df) 17.9424328 18.1571842
 parDist(matrices.list, method = "dtw", step.pattern = "symmetric2",      window.type = "none", upper = T, diag = T, threads = 8)  0.5280135  0.5434037
 parDist(matrices.list, method = "dtw", step.pattern = "symmetric2",      window.type = "none", upper = T, diag = T, threads = 4)  0.6869948  0.6999783
 parDist(matrices.list, method = "dtw", step.pattern = "symmetric2",      window.type = "none", upper = T, diag = T, threads = 2)  1.0311007  1.0646326
 parDist(matrices.list, method = "dtw", step.pattern = "symmetric2",      window.type = "none", upper = T, diag = T, threads = 1)  1.6967269  1.7057925
       mean     median         uq        max neval
 18.4489183 18.4901471 18.6747947 18.9852819    10
  0.5547146  0.5568046  0.5657859  0.5727592    10
  0.7266116  0.7276621  0.7446920  0.7597008    10
  1.0796176  1.0742217  1.0812031  1.1792582    10
  1.7358018  1.7148310  1.7695766  1.8238875    10

Upvotes: 1

Related Questions