Kawi
Kawi

Reputation: 19

Reading large shapefile in R using using sf package

I have a shapefile of points with 32M observations. I want to load it in R and I have tried read_sf and st_read but my R session keeps getting crashed. One other way that came to my mind was to write a for loop, subsetting columns that I want and maybe going a specific number of rows at a time and then rbinding them, but cannot figure out how to make R understand the query. Here's what I have so far which is not working:

for (i in 1:10) {
  j = i-1
  jj = i+1
  print(i)
  print(j)
  print(jj)
  A <- read_sf("C:\\Users\\...parcels-20210802T125336Z-001\\parcels\\join_L3_Mad_Addresses.shp", query = "SELECT FID, CENTROID_I, LOC_ID FROM join_L3_Mad_Addresses WHERE FID < "jj" AND FID > "j"")
}

Upvotes: 1

Views: 2360

Answers (1)

agila
agila

Reputation: 3472

I think you can readapt the following code.

Load packages

library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1

Define path to a shapefile

dsn <- system.file("shape/nc.shp", package="sf")

Count the number of features in dsn

st_layers(dsn, do_count = TRUE)
#> Driver: ESRI Shapefile 
#> Available layers:
#>   layer_name geometry_type features fields
#> 1         nc       Polygon      100     14

Start a for loop to read 10 features at a time. Add the data to a list

shp_data_list <- list()
i <- 1
for (offset in seq(10, 100, by = 10)) {
  query <- paste0("SELECT * FROM nc LIMIT ", 10, " OFFSET ", offset - 10)
  shp_data_list[[i]] <- st_read(dsn, query = query, quiet = TRUE)
  gc(verbose = FALSE)
  i <- i + 1
}

Rbind the objects

shp_data <- do.call(rbind, shp_data_list)

Add an ID column (just for plotting)

shp_data$ID <- as.character(rep(1:10, each = 10))
plot(shp_data["ID"])

The only problem is that this process may not preserve the geometry type. For example,

unique(st_geometry_type(shp_data))
#> [1] MULTIPOLYGON POLYGON     
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

while

unique(st_geometry_type(st_read(dsn, quiet = TRUE)))
#> [1] MULTIPOLYGON
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

You can change the geometry type with st_cast()

Created on 2021-08-03 by the reprex package (v2.0.0)

Upvotes: 1

Related Questions