Ankit Sagar
Ankit Sagar

Reputation: 91

R: st_within is taking too long for computation of spatial object

I have two shp files.

I want to know only those shpfile_1 falling completely inside (100%) of shpfile_2.

The yellow polygon : 100% inside

library(sf)
shp_1   <- st_read(system.file("shape/nc.shp", package="sf"))
shp_1   <- st_transform(shp_1,2154);shp_1
shp_1   <-  st_union(shp_1)

nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc,2154)
shp_2   <- nc[1,]
shp_2   <-st_make_grid(st_as_sfc(st_bbox(shp_2)),cellsize=5000)

within <- st_within(shp_2 , shp_1)


The problem is st_within is taking too much time for 6 Million of polygon.

Is there any way to make the process fast?

Upvotes: 1

Views: 1502

Answers (1)

Bastien
Bastien

Reputation: 3097

I'm not sure it will help, but it might worth investigating changing your polygons (shp_1) to lines. If you have lots of small polygones in very big polygons, it might worth it.

So your approach:

library(sf)
shp_1   <- st_read(system.file("shape/nc.shp", package="sf"))
shp_1   <- st_transform(shp_1,2154);shp_1
shp_1   <-  st_union(shp_1)

nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc,2154)
shp_2   <- nc[1,]
shp_2   <-st_make_grid(st_as_sfc(st_bbox(shp_2)),cellsize=1000)
within <- st_within(shp_2 , shp_1)
within_TF <- sapply(within, function(xx) length(xx)!=0)

The idea is to transform the shp_a polygon into a line shape, than use st_intersects to find all smaller shp_2 polygons that touches the line and remove them from the original shape. Now, if you shp_2 is larger than shp_1 you have first to extract all smaller polygons that do not intersects. This is done with st_intersects and might actually defeat the purpose as it may be slow... Anyway, in this example it's some time better.

inter <- st_intersects(shp_2, shp_1)   ## find the intersects
inter_TF <- sapply(inter, function(xx) length(xx)!=0)  ## prepare the filter
shp_2_touches <- shp_2[inter_TF,]   ## apply the filter

shp_1_line <- st_cast(shp_1, "MULTILINESTRING")  ## convert to lines
inter_line <- st_intersects(shp_2_touches, shp_1_line)  ## find the polygons that touches the line
inter_line_TF <- sapply(inter_line, function(xx) length(xx)!=0)  ## prepare the filter

With that done, you can see that both extraction are the same:

identical(shp_2[within_TF,], shp_2_touches[!inter_line_TF,])
[1] TRUE

For speed, it's not a super gain, but on my horrible computer, it was somehow faster...

microbenchmark::microbenchmark(within = st_within(shp_2 , shp_1),
                               multiline = {
                                 inter <- st_intersects(shp_2, shp_1)
                                 inter_TF <- sapply(inter, function(xx) length(xx)!=0)
                                 shp_2_touches <- shp_2[inter_TF,]
                                 
                                 shp_1_line <- st_cast(shp_1, "MULTILINESTRING")
                                 inter_line <- st_intersects(shp_2_touches, shp_1_line)
                               })
Unit: milliseconds
      expr      min       lq     mean   median       uq      max neval
    within 405.6318 466.0014 760.2502 617.4705 817.4921 2177.999   100
 multiline 197.9589 258.3050 523.7598 399.6880 659.9645 2239.103   100

There is not much way around st_intersects. It generally perform pretty well, but if it's slow in your case, there is not that many way around it. In most cases, you will want to reduce the size of your problem by doing multiple smaller intersection and data reduction.

Upvotes: 3

Related Questions