user21359
user21359

Reputation: 476

Reverse geocoding speed

I'm using R to extract latitude and longitude from a data frame and then getting an address using reverse geocoding.

I have some toy data here:

latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(latitude,longitude)

I write a function to do the actual geocoding:

require(ggmap)
get_address <- function(df){
      long <- as.numeric(df$longitude)
      lat <- as.numeric(df$latitude)
      revgeocode(c(long,lat))
    }

Then apply:

apply(x,1,get_address)

Using system.time(), this takes about a second. However, I plan to do this for data with over a million observations. If it's going to take a while to run, I don't mind, but since I'm fairly new to this, I never know whether long running times are just an inevitable part of getting the data or are due to poor function design. Is there an obvious way to significantly speed up this operation?

EDIT:

I learned from commenters that I'm going to be limited in the number of free requests (2,500/day) I can make. All of my data comes from New York, and the purpose is to match latitude/longitude coordinates with a borough name. Before I found out about the daily restrictions for free users, I had planned to get the address from Google using lat/long coordinates, extract the zip code from this address, then match the zip to a borough. Does anyone have suggestions on how to do this without using the Google Maps Geocoding API?

Upvotes: 3

Views: 1281

Answers (2)

SymbolixAU
SymbolixAU

Reputation: 26258

You could find a 'spatial' data source of the boroughs, then perform geometric operations to find points in polygons using the sf library


Setting up the data

Find a spatial data source. Here is one of the neighbourhoods in geojson format

library(sf)

sf <- sf::st_read("https://raw.githubusercontent.com/blackmad/neighborhoods/master/new-york-city-boroughs.geojson")

Convert your coordinates into a sf object. I've swapped your lat & lon column order.

latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(longitude, latitude)

sf_x <- sf::st_as_sf(x, coords = c("longitude", "latitude"))

To perform spatial operations, the coordinate reference system needs to match between the two geometries

## set the cooridnate reference systesm to be the same
st_crs(sf_x) <- st_crs(sf)

use st_within to find the polygons (neighbourhoods) each point is in

Point-in-polygon calculation

res <- st_within(sf_x, sf)  ## return the indexes of sf that sf_x are within

This gives you a sparse matrix of the indexes of the polygons that each point is in

## view the results
sapply(res, function(x) as.character(sf$name[x]))
# [1] "Bronx"     "Queens"    "Manhattan" "Brooklyn"  "Queens" 

Visual

Confirm with a plot

library(googleway)

x$neighbourhood <- sapply(res, function(x) as.character(sf$name[x]))

mapKey <- "map_api_key"

google_map(key = mapKey) %>%
  add_markers(data = x, info_window = "neighbourhood")

enter image description here


Further Reading

Upvotes: 4

sequoia
sequoia

Reputation: 480

As far as I know, Google's free API is limited to 2,500 requests per day. Nominatim alternatively is provided by OSM, but without any API in R. But for this amount of data I wouldn't consider a web service. Do you have a licence for ArcGIS?

Maybe you can also aggregate your function by avoiding the assignments like this:

require(ggmap)
get_address <- function(df){
  revgeocode(c(as.numeric(df$longitude),as.numeric(df$latitude)))
}

Upvotes: 0

Related Questions