Reputation: 39
As an R newbie I think its time to take the step away from for loops and into the apply functions. I am struggling with this bit of code, and am wondering if anyone can help.
I have a function:
earth.dist <- function (long1, lat1, long2, lat2)
{
rad <- pi/180
a1 <- lat1 * rad
a2 <- long1 * rad
b1 <- lat2 * rad
b2 <- long2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}
Now I have two different data sets, one with a list of predetermined large cities and their long/lat coordinates and one with random locations in America with their long/lat coordinates. The for loop that I have written basically calculates the difference between each long/lat coordinates of the random locations to the long/lat coordinates of the predetermined large city and places the random location in the State which the closest city is located in. Each city in the predetermined list has a State next to it which is inserted in a new column in the random cities spreadsheet.
Is there a way I can do this loop using apply? This loop actually does do the trick but it is so long and bulky and I know that an apply function can do a better job.
Here is the loop:
for(i in 1:nrow(randomlocations)){
vec<-vector()
for(j in 1:nrow(predeterminedcities)){
a<-earth.dist(randomlocations$long[i],randomlocations$lat[i], predeterminedcities$long[j], predeterminedcities$lat[j])
vec[[j]]<-a
}
ind<- as.numeric(which.min(vec))
randomlocations$state[i]<-as.character(predeterminedcities$STATE[ind])
print(i)
}
Upvotes: 1
Views: 80
Reputation: 9687
Since your function is already vectorized, you can use outer
to compute a distance matrix by passing through indices into the data frames. Pop that result through max.col
to find the index of the smallest distance, then use that to find the state name:
#fake test data
randomlocations <- data.frame(lon=runif(100, -80,-70), lat=runif(100, 45,75))
predeterminedcities <- head(randomlocations, 50)
predeterminedcities$STATE <- state.name
randomlocations$state <- predeterminedcities[
max.col( -
outer(1:nrow(randomlocations), 1:nrow(predeterminedcities),
function(i,j) earth.dist(randomlocations$lon[i], randomlocations$lat[i],
predeterminedcities$lon[j], predeterminedcities$lat[j])
)
), "STATE"]
This would easily fit on one line if the variable names were shorter.
Upvotes: 1