Reputation: 155
The function gIntersects()
in rgeos
package to test if point is located in polygons is pretty slow. Is there a good way to speed up the computation?
Upvotes: 0
Views: 1261
Reputation: 6416
I tried a speed comparison between sp::point.in.polygon
, sp::over
and rgeos::gIntersects()
which were already mentioned in the comments. Note that there is also a point.in.poly
function in {spatialEco}
but seems that is just a wrapper of sp::over
.
I realized that sp::point.in.polygon
doesn't handle well multi-part polygons (as also pointed here) and needs to be provided with raw coordinates (so I presume that for multi-part polygons needs to be used in a loop). Note that, sp::point.in.polygon
it was faster than all others only in the case of a square polygon, which makes me think is faster only for simpler shapes. All in all, whenever hitting some speed issues, juts try to test for your specific case. For my specific choice of examples, sp::over
seems a better choice overall, but I would not generalize. Hope my examples are ok, otherwise feel free to correct me.
Since there is no data provided, I used some examples below.
library(rgeos)
library(sp)
library(microbenchmark)
library(ggplot2)
library(maps)
library(maptools)
library(raster)
# Get world map data
# (conversion code from "Applied Spatial Data Analysis with R")
worldmap <- maps::map("world", fill=TRUE, plot=FALSE)
# transform to SpatialPolygons
worldmapPolys <- maptools::map2SpatialPolygons(worldmap,
IDs=sapply(strsplit(worldmap$names, ":"), "[", 1L),
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Generate random points for entire world
set.seed(2017)
pts <- sp::spsample(worldmapPolys, n=10^5, type="random")
# Define functions to test for speed
gIntersects_tst <- function(my.pts, my.poly){
rgeos::gIntersects(spgeom1 = my.pts,
spgeom2 = my.poly,
byid = TRUE)
}
over_tst <- function(my.pts, my.poly){
sp::over(x = my.pts, y = my.poly)
}
point.in.polygon_tst <- function(my.pts, my.poly){
# get coordinates from polygon
XY <- raster::geom(my.poly)
sp::point.in.polygon(point.x = my.pts@coords[,1],
point.y = my.pts@coords[,2],
pol.x = XY[,5],
pol.y = XY[,6],
mode.checked = TRUE)
}
# Micro-benchmarking
# The idea is to test which points fall into a selected polygon (country)
res <- microbenchmark(TF1 <- gIntersects_tst(pts, worldmapPolys[183,]),
TF2 <- gIntersects_tst(worldmapPolys[183,], pts),
idx <- over_tst(pts, worldmapPolys[183,]),
codes <- point.in.polygon_tst(pts, worldmapPolys[183,]))
print(res)
## Unit: milliseconds
## expr min
## TF1 <- gIntersects_tst(pts, worldmapPolys[183, ]) 142.61992
## TF2 <- gIntersects_tst(worldmapPolys[183, ], pts) 125.99551
## idx <- over_tst(pts, worldmapPolys[183, ]) 50.72425
## codes <- point.in.polygon_tst(pts, worldmapPolys[183, ]) 224.57961
## lq mean median uq max neval cld
## 153.46915 174.42346 162.90885 177.69223 338.2691 100 b
## 136.13762 158.88218 144.89180 156.91664 352.3276 100 b
## 55.50899 69.67542 63.80366 78.12026 132.8704 100 a
## 243.12288 276.71458 257.38068 275.46144 589.9082 100 c
ggplot2::autoplot(res) + ggtitle("single-polygon: 100 evaluations")
Note that for gIntersects()
, order of arguments seems to matter. Differences are both in speed and structure of results.
identical(TF1,TF2)
## [1] FALSE
identical(TF1[,1:length(pts)], TF2[1:length(pts),])
## [1] TRUE
class(TF1); str(TF1)
## [1] "matrix"
## logi [1, 1:100000] FALSE FALSE FALSE FALSE FALSE FALSE ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr "Romania"
## ..$ : chr [1:100000] "1" "2" "3" "4" ...
class(TF2); str(TF2)
## [1] "matrix"
## logi [1:100000, 1] FALSE FALSE FALSE FALSE FALSE FALSE ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:100000] "1" "2" "3" "4" ...
## ..$ : chr "Romania"
# Subset world points
pts.gI1 <- pts[TF1,]
pts.gI2 <- pts[TF2,]
pts.ovr <- pts[!is.na(idx),]
pts.PiP <- pts[as.logical(codes),]
# All subsets are identical
identical(pts.gI1, pts.gI2)
## [1] TRUE
identical(pts.gI2, pts.ovr)
## [1] TRUE
identical(pts.ovr, pts.PiP)
## [1] TRUE
# Generate two square polygons
grd <- sp::GridTopology(c(1,1), c(1,1), c(2,1))
polys <- sp::as.SpatialPolygons.GridTopology(grd)
# Generate some random points
set.seed(2017)
pts2 <- sp::spsample(polys, n=10^5, type="random")
# Micro-benchmarking
# Test only for those points falling in first square
res <- microbenchmark(TF1 <- gIntersects_tst(pts2, polys[1,]),
TF2 <- gIntersects_tst(polys[1,], pts2),
idx <- over_tst(pts2, polys[1,]),
codes <- point.in.polygon_tst(pts2, polys[1,]))
print(res)
## Unit: milliseconds
## expr min lq
## TF1 <- gIntersects_tst(pts2, polys[1, ]) 151.35336 165.23526
## TF2 <- gIntersects_tst(polys[1, ], pts2) 123.26241 135.90883
## idx <- over_tst(pts2, polys[1, ]) 54.84891 63.89454
## codes <- point.in.polygon_tst(pts2, polys[1, ]) 9.39330 10.66513
## mean median uq max neval cld
## 189.67848 177.62808 190.89566 365.92728 100 d
## 157.47151 148.50073 160.37567 314.02700 100 c
## 76.42608 70.66998 79.81225 240.55570 100 b
## 14.09199 11.37738 16.88741 46.19245 100 a
ggplot2::autoplot(res) + ggtitle("square polygon: 100 evaluations")
pts2.gI1 <- pts2[TF1,]
pts2.gI2 <- pts2[TF2,]
pts2.ovr <- pts2[!is.na(idx),]
pts2.PiP <- pts2[as.logical(codes),]
# All subsets are identical
identical(pts2.gI1, pts2.gI2)
## [1] TRUE
identical(pts2.gI2, pts2.ovr)
## [1] TRUE
identical(pts2.ovr, pts2.PiP)
## [1] TRUE
Session Info
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] raster_2.5-8 mapview_1.2.0 leaflet_1.1.0 maptools_0.9-2 maps_3.1.1
[6] ggplot2_2.2.1 microbenchmark_1.4-2.1 sp_1.2-4 rgeos_0.3-23
Upvotes: 2