Andreas
Andreas

Reputation: 167

Is it possible to optimize this lookup function? (Faster IF Statement)

Is it possible to optimize this code?

It takes around 2 seconds to run once on my data and since i have to run it repeatedly it adds quite a time to the whole program.

This code sets 2(f1,f2) geofences and checks whether a point in node_coords is within one of those fences. As result it produces a logical vector index that can be used to filter node_coords and only leave those points in place which are in one of those 2 geofences.

Thank you very much in advance! BR Andreas

library("vctrs")



node_coords<-structure(list(lon = c(11.34175, 12.2063556, 12.2066937, 12.2068632, 
12.2070187, 12.2078502), lat = c( 48.27649, 47.8399432, 47.8397677, 
47.8396466, 47.8396952, 47.8395169)), row.names = c(172422L,
260117L, 147288L, 1337832L, 1850176L, 260151L), class = "data.frame")


check_if_point_is_within_geofence <- function(top, left, bottom, right, latitude, longitude){
  # Check latitude bounds first.
  if(top >= latitude && latitude >= bottom){
    # If your bounding box doesn't wrap 
    #              the date line the value
    #               must be between the bounds.
    #               If your bounding box does wrap the 
    #               date line it only needs to be  
    #               higher than the left bound or 
    #               lower than the right bound. 
    if(left <= right && left <= longitude && longitude <= right){
      return(TRUE)
    } else if(left > right && (left <= longitude || longitude <= right)) {
      return(TRUE) 
    }
  }
  return(FALSE)
}

geofence <- function(lon,lat){
  f1 <- base::data.frame("left" = 11.34175, "bottom" = 47.98702 ,"right" = 11.77417 ,"top" = 48.27649)
  f2 <- base::data.frame("left" = 12.10723, "bottom" = 47.84540, "right" = 12.15024, "top" = 47.87435 )
  
  fences <- rbind.data.frame(f1,f2)
  f_list <- apply(fences,1,function(x)  check_if_point_is_within_geofence(top = x[4],left = x[1],bottom = x[2],right = x[3],latitude = lat,longitude = lon ) )
  
  if (vec_in(TRUE,f_list))
    return(TRUE)
  return(FALSE)
}

index <- apply(cbind(node_coords$lon,node_coords$lat),1,function(x)  geofence(x[1],x[2]) )

Upvotes: 1

Views: 66

Answers (1)

MkWTF
MkWTF

Reputation: 1372

This would be an optimized version of your code:

vec_geofence <- function(top, left, bottom, right, lat, lon) {

  # The mask vector represents whether a coordinate is seen in any of the
  #   fences defined by the top, left, bottom and right vectors. In the beginning
  #   all the coordinates haven't been tested, so the respective value in the
  #   mask vector is initialized as False.
  mask <- rep(F, length(lon))
  
  # For each fence...
  for(i in seq_along(top)) {

    # ... check for all the coordinates if they are inside of the fence
    if( left[i] > right[i] )
      new_mask <- top[i] >= lat & lat >= bottom[i] & (left[i] <= lon | lon <= right[i])
    else
      new_mask <- top[i] >= lat & lat >= bottom[i] & (left[i] <= lon & lon <= right[i])
    
    # For all the coordinates that hadn't yet been seen in a fence, and that
    #   are inside the current fence, update the respective mask value to True
    mask[!mask][new_mask] <- T

    # The coordinates that will pass through to the next fence check are the ones
    #   that still haven't been seen inside a fence
    lat <- lat[!new_mask]
    lon <- lon[!new_mask]
  }
  
  mask
}

vec_geofence(fences$top, fences$left, fences$bottom, fences$right, node_coords$lat, node_coords$lon)
#> [1]  TRUE FALSE FALSE FALSE FALSE FALSE

There are 4 main things that I changed:

  1. Moved the fences dataframe outside the geofence function, so its not created everytime you run the function
  2. Converted the if statements in check_if_point_is_within_geofence into logic formulas
  3. Merged both functions into one so you avoid function call delays due to that
  4. Converted the single valued logic formulas into vector logic formulas

This function takes 5s to calculate the geofences for a node_coords dataframe with 10 000 rows and with a fences dataframe with 10 000 rows too.

node_coords_10k = do.call(rbind.data.frame, rep(list(node_coords), 10000/6))
fences_10k = do.call(rbind.data.frame, rep(list(fences), 10000/2))

system.time(vec_geofence(
    fences_10k$top, fences_10k$left, fences_10k$bottom, fences_10k$right, 
    node_coords_10k$lat, node_coords_10k$lon
))
#> user  system elapsed 
#> 4.78    0.03    4.85 

Upvotes: 3

Related Questions