aterhorst
aterhorst

Reputation: 685

Adjust centroids spatial polygons using sf

I have a shapefile of local government regions. I use sf_read() to import it into R as an SF object. I want to compute the distance between the local government regions. st_centroid() gives me polygon centroids and I can compute distance using st_distance().

regions <- st_read("~/Downloads/regions.shp")

regions_with_centroids <- st_centroid(regions)

extract_centroids <- regions_with_centroids %>% 
                           st_drop_geometry() %>% 
                           as_tibble() %>%
                           select(region_name, centroid)
# create edge list

edge_list <- extract_centroids %>%
                           select(region_name) %>%
                           expand(from = region_name, to = region_name) %>% 
                           filter(from < to) %>%
                           left_join(extract-centroids, by = c("from" = "region_name) %>%
                           rename(from_centroid = centroid) %>%
                           left_join(extract-centroids, by = c("to" = "region_name) %>%
                           rename(to_centroid = centroid) %>%
                           mutate(distance = st_distance(from_centroid, to_centroid)

However, I really want to analyze the commuting distance between major urban areas in each government region. I need to shift the centroids to the population "centre of gravity".

I can use a shapefile of census enumerator areas to help me with this. The enumerator areas are sized by population. Using st_intersection() I can intersect the enumerator areas with the government regions. This gives me sub-regions within each government region. I can compute centroids for all the sub-regions. Grouping by region, I can compute the mean centroid for all the sub-regions in a region. The mean centroid = "centre of gravity", which gives a more realistic commute distance between regions.

regions <- st_read("~/Downloads/regions.shp")

ea <- st_read("~/Downloads/enumerator_areas.shp")

intersected <- st_intersection(regions, ea)

sub_region_centroids <- st_centroids(intersected)

Where I run into difficulty is how to find the mean centroid. Grouping by region is not working.

mean_centroid <- sub_region_centroids %>%
                          group_by(region_name) %>%
                          summarise(mean_centroid = mean(geometry))

Warning messages:
1: In mean.default(geometry) :
  the argument is not numeric or logical: returning NA

Where am I going wrong?

I also do not know how to add the mean centroid back to the original region's object.

I hope someone can assist me.

Upvotes: 3

Views: 1266

Answers (2)

nniloc
nniloc

Reputation: 4243

Following @Jindra Lacko's nice example, here is how it can be done taking the weighted mean of the lat and long.

library(sf)
library(dplyr)
library(ggplot2)

# weighted mean of lat and long
center_weighted <- cities %>%
  mutate(lon = sf::st_coordinates(.)[,1],
         lat = sf::st_coordinates(.)[,2]) %>%
  st_drop_geometry() %>%
  summarize(across(c(lon, lat), weighted.mean, w = population)) %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326)


# plot it
ggplot() +
  geom_sf(data = shape, color = "gray75") + 
  geom_sf(data = cities, color = "red") + 
  geom_sf(data = center_weighted, color = "blue", pch = 4)


Data

# set up example data
shape <- st_read(system.file("shape/nc.shp", package="sf")) 
cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
                     x = c(-78.633333, -79.819444, -77.912222),
                     y = c(35.766667, 36.08, 34.223333),
                     population = c(467665, 299035, 115451)) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326)

Upvotes: 4

Jindra Lacko
Jindra Lacko

Reputation: 8719

Computing a population weighted average of multiple centroids is an interesting problem.

You can consider approach like this - where I calculate the weighted centroid of three cities in North Carolina (to make use of the well known & much loved nc.shp file that ships with {sf}).

The workflow uses tidyr::uncount() to first multiply the city points per population, the (many) multiplied points are then united to a single multipoint feature. And multipoint features have defined sf::st_centroid() operation (QED). The final sf::st_as_sf() is just a polish.

library(sf)
library(dplyr)
library(ggplot2)

# included with sf package
shape <- st_read(system.file("shape/nc.shp", package="sf")) 

# dramatis personae; population as per Wikipedia
cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
                  x = c(-78.633333, -79.819444, -77.912222),
                  y = c(35.766667, 36.08, 34.223333),
                  population = c(467665, 299035, 115451)) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326)

# a quick overview of facts on ground 
ggplot() +
   geom_sf(data = shape) + # polygon of North Carolina
   geom_sf(data = cities, color = "red") # 3 cities
   

three cities in NC

# unweighted centroid / a baseline    
plain_center <- cities %>% 
  st_geometry() %>% # pull just geometry
  st_combine() %>%  # from many points to a single multipoint
  st_centroid() %>% # compute centroid of the multipoint
  st_as_sf() # make it a sf object again

# the fun is here!!
center_of_centers <- cities %>% 
  tidyr::uncount(population) %>% # multiply rows according to population
  st_geometry() %>% # pull just geometry
  st_combine() %>%  # from many points to a single multipoint
  st_centroid() %>% # compute centroid of the multipoint
  st_as_sf() # make it a sf object again

# finished result
ggplot() +
  geom_sf(data = shape, color = "gray75") + # polygon of North Carolina
  geom_sf(data = cities, color = "red") + # 3 cities
  geom_sf(data = plain_center, color = "green") + # unweighted center
  geom_sf(data = center_of_centers, color = "blue", pch = 4) # population weighted center

corrected map / with weighted and unweighted centroids both

Upvotes: 4

Related Questions