Reputation: 53
I have recently attempted to do a regionalization analysis with a group of geographic regions, each contains multiple attributes (A1, A2, A3, ...). The goal is not like a regular regionalization problem (such as K-means) in which you define groups with minimal within group dissimilarity but maximal between group dissimilarity.
My regionalization is the opposite, I want the groups to be as similar as possible (although within group does not have to be as dissimilar as possible, but that is of less concern) in terms of means, variance, and other statistics. I ran into the minDiff package and its successor anticlust package in R, and it is able to do the job wonderfully except for one problem: since this is a regionalization problem, I would really want the final groups to be geographically connected. Results from minDiff/anticlust, however, show the different groups are mixed with one another all over the map. Here is a sample code:
A dataframe contains the geographic units and attributes is read from a shapefile and stored in geo.df.
geo.df<-as.data.frame(read_sf(dsn = getwd(), lay = "geolayer", stringsAsFactors = FALSE))
geo.df$class <- anticlustering(geo.df[, c("A1", "A2", "A3", "A4", ..., "An"), K = 5, objective = "variance", standardize = TRUE)
I've tried to include coordinates in the list of attributes (A1, A2, ..., An), pairwise distances, but none worked. I always ended up with well separated groups, but all mixed with one another in the geographic space.
Any pointers on how to proceed from here? Any hints will be greatly appreciated.
Thank you all in advance.
Upvotes: 1
Views: 124
Reputation: 486
This is a classic regionalization problem. You can solve this with the skater algorithm. Since you haven't provided any reproducible example, I can't provide any working code.
Use the spdep library and skater.
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1; sf_use_s2() is TRUE
library(spdep)
#> Loading required package: sp
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`
bh <- st_read(system.file("etc/shapes/bhicv.shp",
package="spdep")[1], quiet=TRUE)
dpad <- data.frame(scale(as.data.frame(bh)[,5:8]))
### neighboorhod list
bh.nb <- poly2nb(bh)
### calculating costs
lcosts <- nbcosts(bh.nb, dpad)
### making listw
nb.w <- nb2listw(bh.nb, lcosts, style="B")
### find a minimum spanning tree
mst.bh <- mstree(nb.w,5)
### three groups with no restriction
res1 <- skater(mst.bh[,1:2], dpad, 4)
plot(st_geometry(bh), col = res1$groups)
Created on 2022-08-18 by the reprex package (v2.0.1)
Upvotes: 0