Nechama
Nechama

Reputation: 39

Use one of the Apply Functions instead of a nested for loop in R

As an R newbie I think its time to take the step away from for loops and into the apply functions. I am struggling with this bit of code, and am wondering if anyone can help.

I have a function:

earth.dist <- function (long1, lat1, long2, lat2)
{


rad <- pi/180
  a1 <- lat1 * rad
  a2 <- long1 * rad
  b1 <- lat2 * rad
  b2 <- long2 * rad
  dlon <- b2 - a2
  dlat <- b1 - a1
  a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
  c <- 2 * atan2(sqrt(a), sqrt(1 - a))
  R <- 6378.145
  d <- R * c
  return(d)
}

Now I have two different data sets, one with a list of predetermined large cities and their long/lat coordinates and one with random locations in America with their long/lat coordinates. The for loop that I have written basically calculates the difference between each long/lat coordinates of the random locations to the long/lat coordinates of the predetermined large city and places the random location in the State which the closest city is located in. Each city in the predetermined list has a State next to it which is inserted in a new column in the random cities spreadsheet.

Is there a way I can do this loop using apply? This loop actually does do the trick but it is so long and bulky and I know that an apply function can do a better job.

Here is the loop:

for(i in 1:nrow(randomlocations)){
  vec<-vector()
  for(j in 1:nrow(predeterminedcities)){
    a<-earth.dist(randomlocations$long[i],randomlocations$lat[i], predeterminedcities$long[j], predeterminedcities$lat[j])
    vec[[j]]<-a
  }
  ind<- as.numeric(which.min(vec))
  randomlocations$state[i]<-as.character(predeterminedcities$STATE[ind])
  print(i)
  }

Upvotes: 1

Views: 80

Answers (1)

Neal Fultz
Neal Fultz

Reputation: 9687

Since your function is already vectorized, you can use outer to compute a distance matrix by passing through indices into the data frames. Pop that result through max.col to find the index of the smallest distance, then use that to find the state name:

#fake test data
randomlocations <- data.frame(lon=runif(100, -80,-70), lat=runif(100, 45,75))
predeterminedcities <- head(randomlocations, 50)
predeterminedcities$STATE <- state.name

randomlocations$state <- predeterminedcities[
  max.col( -
    outer(1:nrow(randomlocations), 1:nrow(predeterminedcities), 
      function(i,j) earth.dist(randomlocations$lon[i], randomlocations$lat[i], 
                             predeterminedcities$lon[j], predeterminedcities$lat[j])
    )
  ), "STATE"]

This would easily fit on one line if the variable names were shorter.

Upvotes: 1

Related Questions