dkantor
dkantor

Reputation: 162

Counting distance takes too long

i have a data frame with more than 10 million rows. I want to count the distance between the lat-lon pairs, and add them to a nem column. I have tried to run the script(see below) , but it takes too long(more than 5 hours). Any tip how can i boost the speed of this process? I use the geosphere package to count distance between the lat-lon pairs.

for (i in seq_len(nrow(dm_kekk)))
{
dm_kekk$dist[i]<-distm (c(dm_kekk$lon[i], dm_kekk$lat[i]), 
                         c(dm_kekk$lon_ok[i], dm_kekk$lat_ok[i]), 
                         fun = distHaversine)

}

Thanks!!!

Upvotes: 3

Views: 573

Answers (1)

Hanjo Odendaal
Hanjo Odendaal

Reputation: 1441

Always give some example of your data and the output you expect to make answering the question a bit easier. One option is for you to just parallel the process or try with dplyr mutate.

library(doParallel)
cores <- detectCores() -1
cl <- makeCluster(cores)
registerDoParallel(cl)

oper_dist <- foreach(i=1:seq_len(nrow(dm_kekk))) %dopar% {
library(geosphere)
  distm (c(dm_kekk$lon[i], dm_kekk$lat[i]), 
                     c(dm_kekk$lon_ok[i], dm_kekk$lat_ok[i]), 
                     fun = distHaversine)
}
stopCluster(cl)
dm_kekk$dist <- do.call(c, oper_dist)

Or use mutate

library(dplyr)

dm_kekk %>% mutate(dist = distm(lon, lat, lon_ok, lat_ok, fun = distHaversine))

Upvotes: 2

Related Questions