Reputation: 3495
I want to make grids (in the sense of data frames of x- and y-coordinates) over the US, or regions of the US, throwing out any points in the original grid that are beyond the borders of the US. I have some code that seems to work, but it's quite slow when I shrink the grid increment down to 1 km (1e3
) or so. How can this be done faster? Perhaps there's a way to build the simple feature collection that I need without a lapply
or loop, or perhaps this can be done with a MULTIPOINT
instead of a simple feature collection of POINT
s.
library(sf)
crs.us.atlas = 2163 # https://epsg.io/2163
us = read_sf(paste0("/vsizip/",
"/tmp/us.zip")) # from: https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_state_500k.zip
# Filter down to the lower 48 + DC.
us = us[us$STUSPS %in% setdiff(c(state.abb, "DC"), c("AK", "HI")),]
us = st_transform(us, crs = crs.us.atlas)
l = as.list(st_bbox(us))
increment = 1e5
g = expand.grid(
x = seq(l$xmin, l$xmax, by = increment),
y = seq(l$ymin, l$ymax, by = increment))
message("Running the slow part")
print(system.time(g <- g[0 < sapply(FUN = length, st_intersects(
st_as_sfc(crs = crs.us.atlas, lapply(1 : nrow(g), function(i)
st_point(as.numeric(g[i, c("x", "y")])))),
us)),]))
print(nrow(g))
Upvotes: 2
Views: 1164
Reputation: 3495
sf
has a function st_make_grid
that could help with this (in older versions of sf
, it even automatically cropped to the polygon), but curiously, it's quite slow as of this writing.
I can get reasonable performance without additional dependencies by using st_as_sf
to convert g
to a simple feature collection of points:
g = st_as_sf(crs = crs.us.atlas, coords = c(1, 2), g)
Then, following @agila's use of unlist
, the slow part becomes
print(system.time(g <- g[unlist(st_intersects(us, g)),]))
which, with increment = 1e3
, takes 3 minutes on a high-end server.
Upvotes: 1
Reputation: 780
Alternatively, you could use the fasterize
package to create a raster grid around the shape file with your desired resolution and then extract the grid coordinates using the raster::rasterToPoints
function.
This works almost immediately for gathering the xy locations. To convert back to an sf object then takes about 10 seconds or so.
library(sf)
library(fasterize)
library(raster)
crs.us.atlas = 2163
# https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_state_500k.zip
us = read_sf(list.files(pattern='.shp$'))
us = us[us$STUSPS %in% setdiff(c(state.abb, "DC"), c("AK", "HI")),]
us = st_transform(us, crs = crs.us.atlas)
increment = 1e3
# create the empty raster to for fasterize base
usrast = raster::raster(us, resolution=rep(increment, 2))
# rasterize the shapefile
r = fasterize::fasterize(us, usrast)
# extract raster cell coordinates
coords = rasterToPoints(r)
# convert coordinates to sf
shp = coords %>%
as.data.frame() %>%
st_as_sf(crs=crs.us.atlas, coords=c('x', 'y'))
Upvotes: 2
Reputation: 3472
I would solve the problem as follows. First, load the packages. tmap
is used just for the map, you can easily ignore that
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(sfheaders)
library(tmap)
Download and read-in data
temp_zip <- tempfile(fileext = ".zip")
download.file(
"https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_state_500k.zip",
destfile = temp_zip
)
unzip(temp_zip, exdir = tempdir())
us <- st_read(file.path(tempdir(), "cb_2019_us_state_500k.shp"))
#> Reading layer `cb_2019_us_state_500k' from data source `C:\Users\Utente\AppData\Local\Temp\RtmpCakscO\cb_2019_us_state_500k.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 56 features and 9 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -179.1489 ymin: -14.5487 xmax: 179.7785 ymax: 71.36516
#> geographic CRS: NAD83
Filter down to the lower 48 + DC.
us = us[us$STUSPS %in% setdiff(c(state.abb, "DC"), c("AK", "HI")),]
us = st_transform(us, crs = 2163)
Define increment, grid and sfc
object. The key part is to create and sfc_point
object for subsequent operations
increment = 1e5
g = expand.grid(
x = seq(st_bbox(us)$xmin, st_bbox(us)$xmax, by = increment),
y = seq(st_bbox(us)$ymin, st_bbox(us)$ymax, by = increment)
)
g_sfc <- sfc_point(as.matrix(g)) %>%
st_set_crs(2163)
Find the ID(s) of points included in the US
my_ids <- unlist(st_contains(us, g_sfc))
Visualise result
tm_shape(g_sfc) +
tm_dots(col = "grey", size = 0.05) +
tm_shape(g_sfc[my_ids]) +
tm_dots(col = "darkred", size = 0.05) +
tm_shape(us) +
tm_borders(lwd = 1.3)
Repeat for 1e3 (but I won't add any plot since that's almost 13 million points)
increment = 1e3
g = expand.grid(
x = seq(st_bbox(us)$xmin, st_bbox(us)$xmax, by = increment),
y = seq(st_bbox(us)$ymin, st_bbox(us)$ymax, by = increment)
)
It takes approximately 20 seconds to generate the data
system.time({
g_sfc <- sfc_point(as.matrix(g)) %>%
st_set_crs(2163)
})
#> user system elapsed
#> 16.29 0.92 17.27
and 80 seconds to find the IDs of the points within US.
system.time({
my_ids <- unlist(st_contains(us, g_sfc))
})
#> user system elapsed
#> 67.75 8.32 80.86
Created on 2021-01-13 by the reprex package (v0.3.0)
If you need something even more efficient, I suggest you polyclip
.
Upvotes: 3