Reputation: 665

Testing for whether geometries include other when touching borders in R

I am new to geodata so maybe this is a silly question. I have two multipolygons and what I want to do is to test in which of the polygon of "constituencies" each polygon of "wards" is contained. Basically, to see for each ward in which consituency it is in.

However, I am having issues with the function st_within() from sf because it only returns TRUE if the first geometry is completely within the second geometry. That is a problem when the smaller units touch the borders of the larger ones. Here is an example

I have one multipolygon called "consituencies" and and one called "wards". You can find them here: https://www.dropbox.com/sh/xu08wm79rym00zz/AADFDpyPe0EuDSDSY-6SvqpYa?dl=0

There's 2 objects in constituencies:

plot(constituencies$geometry)

I then try to join them on whether the wards are within the constituencies

wards <- st_transform(wards, crs = st_crs(constituencies))
test98 <- st_join(wards, constituencies, join=st_within)

And the outcome spits out many NAs on the constituencies' name and label a ward is within of.

    head(test98)
Simple feature collection with 6 features and 4 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 531936.6 ymin: 180569.5 xmax: 533595.1 ymax: 182082.5
projected CRS:  OSGB 1936 / British National Grid
  WD98CD       WD98NM                             name label                       geometry
1 00AAFA   Aldersgate                             <NA>  <NA> MULTIPOLYGON (((532104.9 18...
2 00AAFB      Aldgate                             <NA>  <NA> MULTIPOLYGON (((533319.2 18...
3 00AAFC    Bassishaw Cities of London and Westminster   107 MULTIPOLYGON (((532587.3 18...
4 00AAFD Billingsgate Cities of London and Westminster   107 MULTIPOLYGON (((533167.9 18...
5 00AAFE  Bishopsgate                             <NA>  <NA> MULTIPOLYGON (((533410.7 18...
6 00AAFF Bread Street Cities of London and Westminster   107 MULTIPOLYGON (((532300.3 18...

When I plot them to see which ones have NA in name (in red), you can see it is the ones touching the consituency's borders:

plot(constituencies_short$geometry)
plot(test98$geometry[!is.na(test98$name),], col = "green", add = T)
plot(test98$geometry[is.na(test98$name),], col = "red", add = T)

I assume one of two things is happening: 1) touching borders dos not count as being completely within a geometry, or 2) the borders of the wards and constituencies may not be perfectly matched so the wards aren't entirely within the constituency.

My question is: is there a way of perfectly match the borders and then test which wards are within which constituencies OR is there a way of testing within which constituencies is not the entirety but the majority of a ward's boundaries?

Thank you so much!

Upvotes: 3

Answers (2)

ktiu

Reputation: 2626

Since we are dealing with only slight differences, one approach would be to create a slight negative / inset buffer around the wards geometries before joining.

Prep

library(sf)
library(tmap)

wards_transformed <- st_transform(wards, st_crs(constituencies))

Without buffer:

wards_transformed %>%
  st_join(constituencies, st_within)

With negative buffer:

wards_transformed %>%
  st_buffer(-10) %>%
  st_join(constituencies, st_within)

Upvotes: 3

Spacedman

Reputation: 94202

Your boundaries do not align exactly. Let's see what happens when we intersect the first ward with the two constituencies:

> st_intersection(wards$geom[1], constituencies$geom)
Geometry set for 2 features 
geometry type:  GEOMETRY
dimension:      XY
bbox:           xmin: 531936.6 ymin: 181262.6 xmax: 532309 ymax: 182012
projected CRS:  OSGB 1936 / British National Grid
POLYGON ((532104.9 182011.9, 532126.3 181948, 5...
MULTIPOLYGON (((532022.4 181893.5, 532022.4 181...

we get two output features because the ward has some of it in both constituencies. How much? Let's see:

> st_area(st_intersection(wards$geom[1], constituencies$geom))
Units: [m^2]
[1] 1.298829e+05 9.542473e-01

So that's 129882m^2 in one and 0.95m^2 in the other. Your boundaries misalign so that there's 1 square metre of the ward in the "wrong" constituency!

A fix would be to compute the areas of the full intersection set and threshold on a value or proportion of the ward area.

You can use st_intersects to get a list of all overlaps:

> st_intersects(wards, constituencies)

Sparse geometry binary predicate list of length 25, where the predicate was `intersects'
first 10 elements:
 1: 1, 2
 2: 1
 3: 1
 4: 1
 5: 1, 2
 6: 1

Any elements of that list with one element are wards that overlap one and only one constituency, otherwise you need to compute the area proportions of the overlap of that ward with those constituencies to work out which ones the ward is mostly over.

Upvotes: 4

Testing for whether geometries include other when touching borders in R

Answers (2)

Prep

Without buffer:

With negative buffer:

Related Questions