M. Hahn
M. Hahn

Reputation: 63

Intersecting big spatial datasets in R SF

I have two spatial datasets. One dataset contains lots polygons (more than 150k in total) specifying different features, like rivers, vegetation. The other dataset contains much less polygons (500) specifying different areas. I need to intersect those two datasets to get the features in the different areas. I can subset the first dataset by the different features. If I use a subset from a small feature (2,500 polygons) the intersection with the areas is quite fast (5min). But if I want to interest a bigger feature subset (20,000 polygons) the computation runs really long (I terminated it after two hours). And this is not even the biggest feature (50,000 polygons) I need to intersect.

This is the code snipped I run:

    clean_intersect_save = function(geo_features, areas) {

  # make geometries valid
  data_valid_geoms = st_parallel(sf_df = st_geometry(geo_features), 
                                 sf_func = st_make_valid, 
                                 n_cores = 4)

  # remove unnecessary columns
  data_valid = st_drop_geometry(x) %>% select("feature")
  data_valid = st_sf(data_clean, geometry = data_valid_geoms)

  # intersect the geo-features and areas
  data_valid_split = st_parallel(sf_df = bezirke, 
                                 sf_func = st_intersection, 
                                 n_cores = 4,
                                 data_clean)

  # save shp file
  st_write(data_valid_split, "data_valid_splir.shp")

  return(data_valid_split)
}

Where both inputs are sf data frames. st_parallel is a function I found here.

My question is: How would experienced spatial data people solve such a task usually? Do I just need more cores and/or more patience? Am I using sf wrong? Is R/sf the wrong tool?

Thanks for any help. This is my very first spatial data analysis project, so sorry if I oversee some obvious thinks.

Upvotes: 3

Views: 1917

Answers (1)

M. Hahn
M. Hahn

Reputation: 63

As there probably won´t come a real answer to this vague question I will answer it on my own.

Thanks @Chris and @TimSalabim for the help. I ended up with a combination of both ideas.

I ended up using PostGIS which is from my experience a pretty intuitive way to work with spatial data. The three things which speeded up the calculations of intersection for me are:

Upvotes: 2

Related Questions