Misc
Misc

Reputation: 645

Speeding up a simple for loop with vectorization in R

In R, I have a simple for loop with a function inside. It takes a data frame and looks at the row directly before to find the distance and then populates the dist column. Everything works perfectly but it takes a long time to run on over 120,000 rows (over 5 minutes). Finding a (likely vectorized) way to speed up this function would be greatly appreciated. Just for full disclosure, I have asked a similar question before, but the parameters I needed ended up changing and I was unable to adapt that answer to the new changes.

Sample Data:

lat <- c(32.88084254, 32.88058801, 32.88034199, 32.88027623, 32.88022759)
lon <- c(-117.23543042, -117.23606292, -117.23654377, -117.23723468, -117.23788206)
tripData <- data.frame(cbind(lat, lon))
tripData["dists"] <- NA


for (i in 2:nrow(tripData)) {
tripData$dists[i] <- geodist(tripData[i, c("lat")], 
                                tripData[i, c("lon")],
                                tripData[i-1, c("lat")], 
                                tripData[i-1, c("lon")],
                                units="km")*1000
}

Upvotes: 0

Views: 113

Answers (2)

joran
joran

Reputation: 173717

Assuming that you are using the function geodist from the package gmt, it's documentation states that it already is vectorized:

gmt::geodist(tripData[2:5, "lat"], 
        tripData[2:5, "lon"],
        tripData[1:4, "lat"], 
        tripData[1:4, "lon"],
        units="km")*1000

A small side note: stop doing data.frame(cbind(lat, lon)). You gain nothing compared to data.frame(lat,lon) and you risk much.

Upvotes: 4

Backlin
Backlin

Reputation: 14872

You can vectorize function calls with multiple arguments using mapply (multivariate sapply).

n <- nrow(tripdata)
mapply(geodist,
       tripdata$lat[-1], tripdata$lon[-1],
       tripdata$lat[-n], tripdata$lon[-n],
       moreArgs=list(units="km"))*1000

Upvotes: 2

Related Questions