Reputation: 91
I have two shp files.
I want to know only those shpfile_1 falling completely inside (100%) of shpfile_2.
library(sf)
shp_1 <- st_read(system.file("shape/nc.shp", package="sf"))
shp_1 <- st_transform(shp_1,2154);shp_1
shp_1 <- st_union(shp_1)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc,2154)
shp_2 <- nc[1,]
shp_2 <-st_make_grid(st_as_sfc(st_bbox(shp_2)),cellsize=5000)
within <- st_within(shp_2 , shp_1)
The problem is st_within is taking too much time for 6 Million of polygon.
Is there any way to make the process fast?
Upvotes: 1
Views: 1502
Reputation: 3097
I'm not sure it will help, but it might worth investigating changing your polygons (shp_1) to lines. If you have lots of small polygones in very big polygons, it might worth it.
So your approach:
library(sf)
shp_1 <- st_read(system.file("shape/nc.shp", package="sf"))
shp_1 <- st_transform(shp_1,2154);shp_1
shp_1 <- st_union(shp_1)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc,2154)
shp_2 <- nc[1,]
shp_2 <-st_make_grid(st_as_sfc(st_bbox(shp_2)),cellsize=1000)
within <- st_within(shp_2 , shp_1)
within_TF <- sapply(within, function(xx) length(xx)!=0)
The idea is to transform the shp_a
polygon into a line shape, than use st_intersects
to find all smaller shp_2
polygons that touches the line and remove them from the original shape. Now, if you shp_2
is larger than shp_1
you have first to extract all smaller polygons that do not intersects. This is done with st_intersects
and might actually defeat the purpose as it may be slow... Anyway, in this example it's some time better.
inter <- st_intersects(shp_2, shp_1) ## find the intersects
inter_TF <- sapply(inter, function(xx) length(xx)!=0) ## prepare the filter
shp_2_touches <- shp_2[inter_TF,] ## apply the filter
shp_1_line <- st_cast(shp_1, "MULTILINESTRING") ## convert to lines
inter_line <- st_intersects(shp_2_touches, shp_1_line) ## find the polygons that touches the line
inter_line_TF <- sapply(inter_line, function(xx) length(xx)!=0) ## prepare the filter
With that done, you can see that both extraction are the same:
identical(shp_2[within_TF,], shp_2_touches[!inter_line_TF,])
[1] TRUE
For speed, it's not a super gain, but on my horrible computer, it was somehow faster...
microbenchmark::microbenchmark(within = st_within(shp_2 , shp_1),
multiline = {
inter <- st_intersects(shp_2, shp_1)
inter_TF <- sapply(inter, function(xx) length(xx)!=0)
shp_2_touches <- shp_2[inter_TF,]
shp_1_line <- st_cast(shp_1, "MULTILINESTRING")
inter_line <- st_intersects(shp_2_touches, shp_1_line)
})
Unit: milliseconds
expr min lq mean median uq max neval
within 405.6318 466.0014 760.2502 617.4705 817.4921 2177.999 100
multiline 197.9589 258.3050 523.7598 399.6880 659.9645 2239.103 100
There is not much way around st_intersects
. It generally perform pretty well, but if it's slow in your case, there is not that many way around it. In most cases, you will want to reduce the size of your problem by doing multiple smaller intersection and data reduction.
Upvotes: 3