Reputation: 162
i have a data frame with more than 10 million rows. I want to count the distance between the lat-lon pairs, and add them to a nem column. I have tried to run the script(see below) , but it takes too long(more than 5 hours). Any tip how can i boost the speed of this process? I use the geosphere package to count distance between the lat-lon pairs.
for (i in seq_len(nrow(dm_kekk)))
{
dm_kekk$dist[i]<-distm (c(dm_kekk$lon[i], dm_kekk$lat[i]),
c(dm_kekk$lon_ok[i], dm_kekk$lat_ok[i]),
fun = distHaversine)
}
Thanks!!!
Upvotes: 3
Views: 573
Reputation: 1441
Always give some example of your data and the output you expect to make answering the question a bit easier.
One option is for you to just parallel the process or try with dplyr
mutate.
library(doParallel)
cores <- detectCores() -1
cl <- makeCluster(cores)
registerDoParallel(cl)
oper_dist <- foreach(i=1:seq_len(nrow(dm_kekk))) %dopar% {
library(geosphere)
distm (c(dm_kekk$lon[i], dm_kekk$lat[i]),
c(dm_kekk$lon_ok[i], dm_kekk$lat_ok[i]),
fun = distHaversine)
}
stopCluster(cl)
dm_kekk$dist <- do.call(c, oper_dist)
Or use mutate
library(dplyr)
dm_kekk %>% mutate(dist = distm(lon, lat, lon_ok, lat_ok, fun = distHaversine))
Upvotes: 2