Reputation: 685
I have a shapefile of local government regions. I use sf_read()
to import it into R as an SF object. I want to compute the distance between the local government regions. st_centroid()
gives me polygon centroids and I can compute distance using st_distance()
.
regions <- st_read("~/Downloads/regions.shp")
regions_with_centroids <- st_centroid(regions)
extract_centroids <- regions_with_centroids %>%
st_drop_geometry() %>%
as_tibble() %>%
select(region_name, centroid)
# create edge list
edge_list <- extract_centroids %>%
select(region_name) %>%
expand(from = region_name, to = region_name) %>%
filter(from < to) %>%
left_join(extract-centroids, by = c("from" = "region_name) %>%
rename(from_centroid = centroid) %>%
left_join(extract-centroids, by = c("to" = "region_name) %>%
rename(to_centroid = centroid) %>%
mutate(distance = st_distance(from_centroid, to_centroid)
However, I really want to analyze the commuting distance between major urban areas in each government region. I need to shift the centroids to the population "centre of gravity".
I can use a shapefile of census enumerator areas to help me with this. The enumerator areas are sized by population. Using st_intersection()
I can intersect the enumerator areas with the government regions. This gives me sub-regions within each government region. I can compute centroids for all the sub-regions. Grouping by region, I can compute the mean centroid for all the sub-regions in a region. The mean centroid = "centre of gravity"
, which gives a more realistic commute distance between regions.
regions <- st_read("~/Downloads/regions.shp")
ea <- st_read("~/Downloads/enumerator_areas.shp")
intersected <- st_intersection(regions, ea)
sub_region_centroids <- st_centroids(intersected)
Where I run into difficulty is how to find the mean centroid. Grouping by region is not working.
mean_centroid <- sub_region_centroids %>%
group_by(region_name) %>%
summarise(mean_centroid = mean(geometry))
Warning messages:
1: In mean.default(geometry) :
the argument is not numeric or logical: returning NA
Where am I going wrong?
I also do not know how to add the mean centroid back to the original region's object.
I hope someone can assist me.
Upvotes: 3
Views: 1266
Reputation: 4243
Following @Jindra Lacko's nice example, here is how it can be done taking the weighted mean of the lat and long.
library(sf)
library(dplyr)
library(ggplot2)
# weighted mean of lat and long
center_weighted <- cities %>%
mutate(lon = sf::st_coordinates(.)[,1],
lat = sf::st_coordinates(.)[,2]) %>%
st_drop_geometry() %>%
summarize(across(c(lon, lat), weighted.mean, w = population)) %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326)
# plot it
ggplot() +
geom_sf(data = shape, color = "gray75") +
geom_sf(data = cities, color = "red") +
geom_sf(data = center_weighted, color = "blue", pch = 4)
Data
# set up example data
shape <- st_read(system.file("shape/nc.shp", package="sf"))
cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
x = c(-78.633333, -79.819444, -77.912222),
y = c(35.766667, 36.08, 34.223333),
population = c(467665, 299035, 115451)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
Upvotes: 4
Reputation: 8719
Computing a population weighted average of multiple centroids is an interesting problem.
You can consider approach like this - where I calculate the weighted centroid of three cities in North Carolina (to make use of the well known & much loved nc.shp
file that ships with {sf}
).
The workflow uses tidyr::uncount()
to first multiply the city points per population, the (many) multiplied points are then united to a single multipoint feature. And multipoint features have defined sf::st_centroid()
operation (QED). The final sf::st_as_sf()
is just a polish.
library(sf)
library(dplyr)
library(ggplot2)
# included with sf package
shape <- st_read(system.file("shape/nc.shp", package="sf"))
# dramatis personae; population as per Wikipedia
cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
x = c(-78.633333, -79.819444, -77.912222),
y = c(35.766667, 36.08, 34.223333),
population = c(467665, 299035, 115451)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# a quick overview of facts on ground
ggplot() +
geom_sf(data = shape) + # polygon of North Carolina
geom_sf(data = cities, color = "red") # 3 cities
# unweighted centroid / a baseline
plain_center <- cities %>%
st_geometry() %>% # pull just geometry
st_combine() %>% # from many points to a single multipoint
st_centroid() %>% # compute centroid of the multipoint
st_as_sf() # make it a sf object again
# the fun is here!!
center_of_centers <- cities %>%
tidyr::uncount(population) %>% # multiply rows according to population
st_geometry() %>% # pull just geometry
st_combine() %>% # from many points to a single multipoint
st_centroid() %>% # compute centroid of the multipoint
st_as_sf() # make it a sf object again
# finished result
ggplot() +
geom_sf(data = shape, color = "gray75") + # polygon of North Carolina
geom_sf(data = cities, color = "red") + # 3 cities
geom_sf(data = plain_center, color = "green") + # unweighted center
geom_sf(data = center_of_centers, color = "blue", pch = 4) # population weighted center
Upvotes: 4