Reputation: 645
In R, I have a simple for loop with a function inside. It takes a data frame and looks at the row directly before to find the distance and then populates the dist column. Everything works perfectly but it takes a long time to run on over 120,000 rows (over 5 minutes). Finding a (likely vectorized) way to speed up this function would be greatly appreciated. Just for full disclosure, I have asked a similar question before, but the parameters I needed ended up changing and I was unable to adapt that answer to the new changes.
Sample Data:
lat <- c(32.88084254, 32.88058801, 32.88034199, 32.88027623, 32.88022759)
lon <- c(-117.23543042, -117.23606292, -117.23654377, -117.23723468, -117.23788206)
tripData <- data.frame(cbind(lat, lon))
tripData["dists"] <- NA
for (i in 2:nrow(tripData)) {
tripData$dists[i] <- geodist(tripData[i, c("lat")],
tripData[i, c("lon")],
tripData[i-1, c("lat")],
tripData[i-1, c("lon")],
units="km")*1000
}
Upvotes: 0
Views: 113
Reputation: 173717
Assuming that you are using the function geodist
from the package gmt, it's documentation states that it already is vectorized:
gmt::geodist(tripData[2:5, "lat"],
tripData[2:5, "lon"],
tripData[1:4, "lat"],
tripData[1:4, "lon"],
units="km")*1000
A small side note: stop doing data.frame(cbind(lat, lon))
. You gain nothing compared to data.frame(lat,lon)
and you risk much.
Upvotes: 4
Reputation: 14872
You can vectorize function calls with multiple arguments using mapply
(multivariate sapply).
n <- nrow(tripdata)
mapply(geodist,
tripdata$lat[-1], tripdata$lon[-1],
tripdata$lat[-n], tripdata$lon[-n],
moreArgs=list(units="km"))*1000
Upvotes: 2