R - How to speed up Euclidean distance calculation on a very large dataset

Question

community,

I have a very large dataset containing 3 columns with coordinates (x, y, z) and 24 x 10^6 rows. I need to calculate the euclidean distance between all rows and the first row which is 0, 0, 0. With the loop below this takes a very long time! I have also tried this also on a matrix instead of a dataframe, but that did not solve the problem.

Does anyone have suggestions to speed up this process?

library(cluster)

e <- list() # list to be filled with euclidean distances

for (r in 1:(nrow(pca.123.df))) {

  eucl.dist <- daisy(pca.123.df[c(1,r), ], metric = "euclidean") # Euclidean distance between anomaly and zero (row 1)

  e[[r]] <- eucl.dist[1]

}

R - How to speed up Euclidean distance calculation on a very large dataset

Answers (1)

Related Questions