CoderGuy123
CoderGuy123

Reputation: 6649

How to calculate the length of borders between countries in R?

In some analyses, it makes sense to use border length as a measure of cultural distance between countries, the idea being that countries that share larger proportions of their borders are more culturally close. This then raises the question of how to compute this. We can grab a shapefile of the world from naturalearthdata.com which covers some 251 units (i.e. they are not all sovereign).

I looked over the methods in the Geocomputation with R ebook website and it seems like an intersection is closest to what we want, i.e. st_intersection(), while st_touches() finds the neighbors without giving any sense of the border length. However, when I try it out on two neighbors, Denmark and Germany, I get no overlap:

> suppressWarnings(library(sf))
Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3; sf_use_s2() is TRUE
> world = read_sf("data/ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp")
> #make valid otherwise get an error
> world = st_make_valid(world)
> #which countries touch each other by border (of the polygons)
> neighbor_ids = st_touches(
+   world$geometry,
+   world$geometry
+ )
> #Denmark Germany
> (germany_idx = which(world$ADMIN=="Germany"))
[1] 50
> (denmark_idx = which(world$ADMIN=="Denmark"))
[1] 71
> world$ADMIN[neighbor_ids[germany_idx][[1]]]
[1] "France"      "Czechia"     "Luxembourg"  "Belgium"     "Denmark"     "Poland"      "Austria"     "Switzerland" "Netherlands"
> world$ADMIN[neighbor_ids[denmark_idx][[1]]]
[1] "Germany"
> #border intersection
> #Denmark Germany border as test
> st_intersection(
+   world$geometry[germany_idx],
+   world$geometry[denmark_idx]
+ )
Geometry set for 0 features 
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
CRS:           4326

How does one get the border lengths? According to Wikipedia, it should be 68 km.

It seems that what is needed is to tell st_intersection() to include the line at the border. By default, this 1 point overlap is ignored, I guess because it has a 0 area. This functionality is controlled by the ... which forwards to s2_options(). The right parameter is model, which defaults to "open", but should be "closed". Thus:

> #include the line
> st_intersection(
+   world$geometry[germany_idx],
+   world$geometry[denmark_idx],
+   model = "closed"
+ )
Geometry set for 1 feature 
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: 8.660776 ymin: 54.80162 xmax: 9.437503 ymax: 54.9059
CRS:           4326
MULTILINESTRING ((9.436922 54.81014, 9.422143 5...

To get the length, just add on st_length():

> st_intersection(
+   world$geometry[germany_idx],
+   world$geometry[denmark_idx],
+   mode = "closed"
+ ) %>% st_length()
111563 [m]

Only the result is wrong! The scaling is off by factor of 1.64 or so.

Potential problems:

enter image description here

Upvotes: 2

Views: 405

Answers (1)

Jindra Lacko
Jindra Lacko

Reputation: 8699

This is an interesting problem; I believe the coastline paradox plays a role, but only a minor one. The chief issue seems to be driven by CRS.

Let me illustrate on three examples using the world dataset provided by GISCO (i.e. Eurostat). I like this dataset as it allows several levels of precision.

Compare these with the official, i.e. wikipedia length of 68 kilometers.

The rough map is off by about 1/6th, which is to be expected given the low resolution. The fine map is quite close (7% off), and you could expect the actual length to increase yet more, as 1:1M is still a coarse map.

On the other hand the length of the same fine map as in previous example, but projected in WGS84, is off by a factor of two, as you observed.

library(sf)
library(dplyr)
library(giscoR)

# rough line / resolution 1:60 000 000 
rough <- gisco_get_countries(resolution = "60",
                             epsg = 3035,
                             country = c("DE", "DK"))

plot(st_geometry(rough))

map of germany and denmark in coarse resolution

sf::st_intersection(rough[1, ], rough[2, ], model = "closed") %>%
  mutate(border_length = st_length(.)) %>% 
  pull(border_length)
# 53141.03 [m]

# fine line / resolution 1:1 000 000 
fine <- gisco_get_countries(resolution = "01",
                            epsg = 3035,
                            country = c("DE", "DK"))

plot(st_geometry(fine))

map of germany and denmark in fine resolution

sf::st_intersection(fine[1, ], fine[2, ], model = "closed") %>%
  mutate(border_length = st_length(.)) %>% 
  pull(border_length)
# 63795.4 [m]

# fine line in WGS84
fine_wgs <- gisco_get_countries(resolution = "01",
                            epsg = 4326,
                            country = c("DE", "DK"))


sf::st_intersection(fine_wgs[1, ], fine_wgs[2, ], model = "closed") %>%
  mutate(border_length = st_length(.)) %>% 
  pull(border_length)
# 127223 [m]

EDIT (2022-09-12) on second thought this seems to be affected by the behavior of the S2 engine behind {sf} (turning it off via sf_use_s2(FALSE) leads to more reasonable length of borders even for data projected in WGS84).

I will raise it as an issue with {sf} maintainers, as it does not seem likely that this is an expected behaviour.

Upvotes: 2

Related Questions