Reputation: 321
I'm sure this question has been answered elsewhere, but I have not been able to come up with it by searching.
I have points representing cities within a country along with population for each city. I also have a polygon file of counties. I want to find the location of the largest city within each county.
How can this be done?
Here is some data
structure(list(Country = c("us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us",
"us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us"), City =
c("cabarrus", "cox store", "cal-vel", "briarwood townhouses", "barker heights", "davie
crossroads", "crab point village", "azalea", "chesterfield", "charlesmont", "connor", "clover garden", "corriher heights", "callisons", "crestview acres", "clegg", "canaan park", "chantilly", "belgrade", "brices crossroads", "bluff", "butner", "bottom", "bandy", "bostian heights"), AccentCity = c("Cabarrus", "Cox Store", "Cal-Vel", "Briarwood Townhouses", "Barker Heights", "Davie Crossroads", "Crab Point Village", "Azalea", "Chesterfield", "Charlesmont", "Connor", "Clover Garden", "Corriher Heights", "Callisons", "Crestview Acres", "Clegg", "Canaan Park", "Chantilly", "Belgrade", "Brices Crossroads", "Bluff", "Butner", "Bottom", "Bandy", "Bostian Heights"), Region = c("NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC"), Population = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, A_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_), Latitude = (35.2369444, 35.275, 36.4291667, 35.295, 35.3111111, 35.8319444, 34.7602778, 35.58, 35.81, 5.9341667,
35.7419444, 36.1883333, 35.5605556, 35.0841667, 35.0213889, 35.8655556, 36.2761111, 36.3016667, 34.88, 34.8186111, 35.8377778, 36.1319444, 36.4747222, 35.6419444, 35.7544444), Longitude = c(-80.5419444, -82.0352778, -78.9694444, -81.5238889, -82.4441667, -80.535, -76.7305556, -82.4713889, -81.6611111, -81.5127778, -78.1486111, -79.4630556, -80.635, -76.7255556, -80.5427778, -78.8497222,
-79.7852778, -76.1711111, -77.2352778, -78.1016667, -82.8580556, -78.7569444, -80.7741667, -81.09, -80.9294444)), .Names = c("Country", "City", "AccentCity", "Region", "Population", "Latitude", "Longitude"), row.names = c(544L, 889L, 551L, 434L, 190L, 975L, 894L, 147L,
717L, 700L, 831L, 773L, 862L, 559L, 915L, 753L, 584L, 695L, 262L, 437L, 372L, 537L, 406L, 178L, 02L), class = "data.frame")
and some code to read in north carolina
xx <- readShapePoly(system.file("shapes/sids.shp", package="maptools")[1],
IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
plot(xx)
I want to find the city with the maximum population within each county. i'm sorry I don't have a reproducible example. If I did, I would have the answer!
Upvotes: 0
Views: 657
Reputation: 59415
The short answer is that you should use gContains(...)
in package rgeos
.
Here is the long answer.
In the code below, we grab a high resolution shapefile of North Carolina counties from the GADM database, and a geocoded dataset of North Carolina cities from from the US Geological Survey database. The latter already has county information but we ignore that. Then we map cities to their appropriate county using gContains(...)
, add that information to the cities data frame, and identify the largest city in each county using the data.table package. Most of the work is in 4 lines of code near the end.
library(raster) # for getData(...); you may not need this
library(foreign) # for read.dbf(...); you may not need this
library(rgeos) # for gContains(...); loads package sp as well
setwd("< directory for downloaded data >")
# get North Carolina Counties shapefile from GADM database
USA <- getData("GADM",country="USA",level=2) # level 2 is counties
NC.counties <- USA[USA$NAME_1=="North Carolina",] # North Carolina Counties
# get North Carolina Cities data from USGS database
url <- "http://dds.cr.usgs.gov/pub/data/nationalatlas/citiesx010g_shp_nt00962.tar.gz"
download.file(url,"cities.tar.gz")
untar("cities.tar.gz")
data <- read.dbf("citiesx010g.dbf",as.is=TRUE)
NC.data <- data[data$STATE=="NC",c("NAME","COUNTY","LATITUDE","LONGITUDE","POP_2010")]
## --- evverything up to here is just to set up the example
# convert cities data.frame to SpatialPointsDataFrame
NC.cities <- SpatialPointsDataFrame(NC.data[,c("LONGITUDE","LATITUDE")],
data=NC.data,
proj4string=CRS(proj4string(NC.counties)))
# map cities to counties
city.cnty <- gContains(NC.counties,NC.cities,byid=TRUE)
# add county information to cities data
NC.data$county <- apply(city.cnty,1,function(cnty)ifelse(any(cnty),NC.counties@data[cnty,]$NAME_2,NA))
# identify largest city in each county
library(data.table)
result <- setDT(NC.data)[,.SD[which.max(POP_2010)],by="county"]
head(result)
# county NAME COUNTY LATITUDE LONGITUDE POP_2010
# 1: Jackson Cullowhee Jackson 35.31371 -83.17653 6228
# 2: Graham Robbinsville Graham 35.32287 -83.80740 620
# 3: Wilkes North Wilkesboro Wilkes 36.15847 -81.14758 4245
# 4: Rowan Salisbury Rowan 35.67097 -80.47423 33662
# 5: Buncombe Asheville Buncombe 35.60095 -82.55402 83393
# 6: Wayne Goldsboro Wayne 35.38488 -77.99277 36437
The workhorse here is the line:
city.cnty <- gContains(NC.counties,NC.cities,byid=TRUE)
This compares every point in the SpatialPointsDataFrame NC.Cities
to every Polygon in the SpatialPolygonsDataFrame NC.counties
and returns a logical matrix where tthe rows represent cities and the columns represent counties, and the [i,j]
element is TRUE
if city i
is in county j
, FALSE
otherwise. We process the matrix row-wise in the next statement:
NC.data$county <- apply(city.cnty,1,function(cnty)ifelse(any(cnty),NC.counties@data[cnty,]$NAME_2,NA))
using each row in succession to index the attributes table for NC.counties
to extract the county name.
The data you provided in your question has some problems which are nevertheless instructive. First, the NC shapefile in the maptools
package is relatively low resolution. In particular this means that some of the coastal islands are completely missing, so any city on one of those islands will not map to a county. You might have the same problem with your real data so watch out for it.
Second, comparing the COUNTY
column in the original USGS dataset with the county
column which we added, there are 3 (out of 865) counties that do not agree. It turns out that, in those cases, the USGS database was wrong (or out of date). You might have the same problem so watch out for that too.
Third, an additional three cities did not map to any county. These were all coastal cities and probably reflect small inaccuracies in the North Carolina shapefile. You night have this problem as well.
Upvotes: 0