jrp
jrp

Reputation: 21

Random geospatial sampling from a grid design

I am trying to create a spatial survey design by first putting a grid over a study site and then sampling randomly within each grid cells. Here is a made-up example that creates a grid of hexagons over the state of Montana. The problem is that st_sample provides random points over the entire shape, not within the grid. I tried the other types within st_sample and they do not work. I suspect that st_sample doesn't see my grid as multiple polygons?

Any ideas, either using these packages/functions or others in R?

  States <- raster::getData("GADM", country = "United States", level = 1)
  Montana <- States[States$NAME_1 == "Montana",]
  MT <- sf::st_as_sf(Montana)
  
  
  grid <- MT %>% 
    st_make_grid(cellsize = 1, what = "polygons", square = FALSE) %>% 
    st_intersection(MT)
  
  set.seed(51)
  pts_sf <- st_sample(grid, size = 78, type = 'random')
  
  ggplot() + 
    geom_sf(data = MT) + 
    geom_sf(data = grid) + 
    geom_sf(data = pts_sf)

Upvotes: 2

Views: 560

Answers (1)

dimfalk
dimfalk

Reputation: 1413

?st_sample revealed there is a by_polygon = TRUE parameter for MULTIPOLYGON geometries which sounded very promising but this can't be that easy, right?

This is how I modified your code, basically a st_combine call followed by st_as_sf on your grid in order to obtain a simple feature collection of type MULTIPOLYGON:

States <- raster::getData("GADM", country = "United States", level = 1)
Montana <- States[States$NAME_1 == "Montana",]
MT <- sf::st_as_sfc(Montana)

grid <- MT |> 
  sf::st_make_grid(cellsize = 1, what = "polygon", square = FALSE) |> 
  sf::st_intersection(MT)  |> 
  sf::st_as_sf()

grid_mp <- sf::st_combine(grid) |> sf::st_as_sf() 

set.seed(51)
pts_sf <- sf::st_sample(grid_mp, size = dim(grid)[1], type = "random", by_polygon = TRUE)

... leading to the following error message:

Error in s2_geography_from_wkb(x, oriented = oriented, check = check) : 
  Evaluation error: Found 1 feature with invalid spherical geometry.
[1] Loop 44 edge 5 has duplicate near loop 47 edge 3.

So I wrote a little function in order to understand what is happening here (don't hate me for this inelegant approach):

st_sample_per_feature <- function(data) {
  
  n <- dim(data)[1]
  
  list <- list()
  
  for (i in 1:n) {
    
    list[[i]] <- sf::st_sample(data[[1]][i], size = 1, type = "random") |> sf::st_as_sf()
  }
  
  dplyr::bind_rows(list)
}

pts_full <- st_sample_per_feature(grid)

pts_full
#> Simple feature collection with 78 features and 0 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -116.0416 ymin: 44.55168 xmax: -104.0458 ymax: 48.99927
#> CRS:           +proj=longlat +datum=WGS84
#> First 10 features:
#>                             x
#> 1  POINT (-115.6625 47.71146)
#> 2  POINT (-116.0416 48.99488)
#> 3  POINT (-115.3728 47.32979)
#> 4  POINT (-115.8625 48.48905)
#> 5  POINT (-114.5906 47.82194)

The output looks quite promising on the whole but the plot shows clear patterns along the border:

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = MT) + 
  ggplot2::geom_sf(data = grid, color = "red", fill = ggplot2::alpha("grey", 0.2)) + 
  ggplot2::geom_sf(data = pts_full)

sampling distribution

That's when I started to count the grid cells manually. 62 of them are visible but grid consists of 78 objects. st_area(grid) |> sort(decreasing = TRUE) shows there are exact 16 polygon objects in grid having a significantly smaller area than the rest.

Unfortunately, this is not a solution to your problem at this point but I strongly assume st_sample(x, by_polygon = TRUE) fails in the first place because of these artifacts resulting from the way you call st_make_grid().

So basically it seems like the problem here is not the sampling-per-polygon but the definition of a suitable and geometrically correct (?) grid.

Upvotes: 1

Related Questions