Reputation: 665
I am new to geodata so maybe this is a silly question. I have two multipolygons and what I want to do is to test in which of the polygon of "constituencies" each polygon of "wards" is contained. Basically, to see for each ward in which consituency it is in.
However, I am having issues with the function st_within()
from sf
because it only returns TRUE if the first geometry is completely within the second geometry. That is a problem when the smaller units touch the borders of the larger ones. Here is an example
I have one multipolygon called "consituencies" and and one called "wards". You can find them here: https://www.dropbox.com/sh/xu08wm79rym00zz/AADFDpyPe0EuDSDSY-6SvqpYa?dl=0
There's 2 objects in constituencies:
plot(constituencies$geometry)
I then try to join them on whether the wards are within the constituencies
wards <- st_transform(wards, crs = st_crs(constituencies))
test98 <- st_join(wards, constituencies, join=st_within)
And the outcome spits out many NAs on the constituencies' name and label a ward is within of.
head(test98)
Simple feature collection with 6 features and 4 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 531936.6 ymin: 180569.5 xmax: 533595.1 ymax: 182082.5
projected CRS: OSGB 1936 / British National Grid
WD98CD WD98NM name label geometry
1 00AAFA Aldersgate <NA> <NA> MULTIPOLYGON (((532104.9 18...
2 00AAFB Aldgate <NA> <NA> MULTIPOLYGON (((533319.2 18...
3 00AAFC Bassishaw Cities of London and Westminster 107 MULTIPOLYGON (((532587.3 18...
4 00AAFD Billingsgate Cities of London and Westminster 107 MULTIPOLYGON (((533167.9 18...
5 00AAFE Bishopsgate <NA> <NA> MULTIPOLYGON (((533410.7 18...
6 00AAFF Bread Street Cities of London and Westminster 107 MULTIPOLYGON (((532300.3 18...
When I plot them to see which ones have NA in name (in red), you can see it is the ones touching the consituency's borders:
plot(constituencies_short$geometry)
plot(test98$geometry[!is.na(test98$name),], col = "green", add = T)
plot(test98$geometry[is.na(test98$name),], col = "red", add = T)
I assume one of two things is happening: 1) touching borders dos not count as being completely within a geometry, or 2) the borders of the wards and constituencies may not be perfectly matched so the wards aren't entirely within the constituency.
My question is: is there a way of perfectly match the borders and then test which wards are within which constituencies OR is there a way of testing within which constituencies is not the entirety but the majority of a ward's boundaries?
Thank you so much!
Upvotes: 3
Views: 831
Reputation: 2626
Since we are dealing with only slight differences, one approach would be to create a slight negative / inset buffer around the wards geometries before joining.
library(sf)
library(tmap)
wards_transformed <- st_transform(wards, st_crs(constituencies))
wards_transformed %>%
st_join(constituencies, st_within)
wards_transformed %>%
st_buffer(-10) %>%
st_join(constituencies, st_within)
Upvotes: 3
Reputation: 94202
Your boundaries do not align exactly. Let's see what happens when we intersect the first ward with the two constituencies:
> st_intersection(wards$geom[1], constituencies$geom)
Geometry set for 2 features
geometry type: GEOMETRY
dimension: XY
bbox: xmin: 531936.6 ymin: 181262.6 xmax: 532309 ymax: 182012
projected CRS: OSGB 1936 / British National Grid
POLYGON ((532104.9 182011.9, 532126.3 181948, 5...
MULTIPOLYGON (((532022.4 181893.5, 532022.4 181...
we get two output features because the ward has some of it in both constituencies. How much? Let's see:
> st_area(st_intersection(wards$geom[1], constituencies$geom))
Units: [m^2]
[1] 1.298829e+05 9.542473e-01
So that's 129882m^2 in one and 0.95m^2 in the other. Your boundaries misalign so that there's 1 square metre of the ward in the "wrong" constituency!
A fix would be to compute the areas of the full intersection set and threshold on a value or proportion of the ward area.
You can use st_intersects
to get a list of all overlaps:
> st_intersects(wards, constituencies)
Sparse geometry binary predicate list of length 25, where the predicate was `intersects'
first 10 elements:
1: 1, 2
2: 1
3: 1
4: 1
5: 1, 2
6: 1
Any elements of that list with one element are wards that overlap one and only one constituency, otherwise you need to compute the area proportions of the overlap of that ward with those constituencies to work out which ones the ward is mostly over.
Upvotes: 4