Reputation: 40146
I could get at my goals "the long way" but am hoping to stay completely within R. I am looking to append Census demographic data by zip code to records in my database. I know that R has a few Census-based packages, but, unless I am missing something, these data do not seem to exist at the zip code level, nor is it intuitive to merge onto an existing data frame.
In short, is it possible to do this within R, or is my best approach to grab the data elsewhere and read it into R?
Any help will be greatly appreciated!
Upvotes: 6
Views: 4760
Reputation: 305
As others in this thread have mentioned, the Census Bureau American FactFinder is a free source of comprehensive and detailed data. Unfortunately, it’s not particularly easy to use in its raw format.
We’ve pulled, cleaned, consolidated, and reformatted the Census Bureau data. The details of this process and how to use the data files can be found on our team blog.
None of these tables actually have a field called “ZIP code.” Rather, they have a field called “ZCTA5”. A ZCTA5 (or ZCTA) can be thought of as interchangeable with a zip code given following caveats:
Upvotes: 3
Reputation: 37
simple for loop to get zip level population. you need to get a key though. it is for US now.
masterdata <- data.table()
for(z in 1:length(ziplist)){
print(z)
textt <- paste0("http://api.opendatanetwork.com/data/v1/values?variable=demographics.population.count&entity_id=8600000US",ziplist[z],"&forecast=3&describe=false&format=&app_token=YOURKEYHERE")
errorornot <- try(jsonlite::fromJSON(textt), silent=T)
if(is(errorornot,"try-error")) next
data <- jsonlite::fromJSON(textt)
data <- as.data.table(data$data)
zipcode <- data[1,2]
data <- data[2:nrow(data)]
setnames(data,c("Year","Population","Forecasted"))
data[,ZipCodeQuery:=zipcode]
data[,ZipCodeData:=ziplist[z]]
masterdata <- rbind(masterdata,data)
}
Upvotes: 0
Reputation: 1798
I just wrote a R package called totalcensus
(https://github.com/GL-Li/totalcensus), with which you can extract any data in decennial census and ACS survey easily.
For this old question if you still care, you can get total population (by default) and population of other races from national data of decennial census 2010 or 2015 ACS 5-year survey.
From 2015 ACS 5-year survey. Download national data with download_census("acs5year", 2015, "US")
and then:
zip_acs5 <- read_acs5year(
year = 2015,
states = "US",
geo_headers = "ZCTA5",
table_contents = c(
"white = B02001_002",
"black = B02001_003",
"asian = B02001_005"
),
summary_level = "860"
)
# GEOID lon lat ZCTA5 state population white black asian GEOCOMP SUMLEV NAME
# 1: 86000US01001 -72.62827 42.06233 01001 NA 17438 16014 230 639 all 860 ZCTA5 01001
# 2: 86000US01002 -72.45851 42.36398 01002 NA 29780 23333 1399 3853 all 860 ZCTA5 01002
# 3: 86000US01003 -72.52411 42.38994 01003 NA 11241 8967 699 1266 all 860 ZCTA5 01003
# 4: 86000US01005 -72.10660 42.41885 01005 NA 5201 5062 40 81 all 860 ZCTA5 01005
# 5: 86000US01007 -72.40047 42.27901 01007 NA 14838 14086 104 330 all 860 ZCTA5 01007
# ---
# 32985: 86000US99923 -130.04103 56.00232 99923 NA 13 13 0 0 all 860 ZCTA5 99923
# 32986: 86000US99925 -132.94593 55.55020 99925 NA 826 368 7 0 all 860 ZCTA5 99925
# 32987: 86000US99926 -131.47074 55.13807 99926 NA 1711 141 0 2 all 860 ZCTA5 99926
# 32988: 86000US99927 -133.45792 56.23906 99927 NA 123 114 0 0 all 860 ZCTA5 99927
# 32989: 86000US99929 -131.60683 56.41383 99929 NA 2365 1643 5 60 all 860 ZCTA5 99929
From Census 2010. Download national data with download_census("decennial", 2010, "US")
and then:
zip_2010 <- read_decennial(
year = 2010,
states = "US",
table_contents = c(
"white = P0030002",
"black = P0030003",
"asian = P0030005"
),
geo_headers = "ZCTA5",
summary_level = "860"
)
# lon lat ZCTA5 state population white black asian GEOCOMP SUMLEV
# 1: -66.74996 18.18056 00601 NA 18570 17285 572 5 all 860
# 2: -67.17613 18.36227 00602 NA 41520 35980 2210 22 all 860
# 3: -67.11989 18.45518 00603 NA 54689 45348 4141 85 all 860
# 4: -66.93291 18.15835 00606 NA 6615 5883 314 3 all 860
# 5: -67.12587 18.29096 00610 NA 29016 23796 2083 37 all 860
# ---
# 33116: -130.04103 56.00232 99923 NA 87 79 0 0 all 860
# 33117: -132.94593 55.55020 99925 NA 819 350 2 4 all 860
# 33118: -131.47074 55.13807 99926 NA 1460 145 6 2 all 860
# 33119: -133.45792 56.23906 99927 NA 94 74 0 0 all 860
# 33120: -131.60683 56.41383 99929 NA 2338 1691 3 33 all 860
Upvotes: 2
Reputation:
Your best bet is probably with the U.S. Census Bureau TIGER/Line shapefiles. They have ZIP code tabulation area shapefiles (ZCTA5) for 2010 at the state level which may be sufficient for your purposes.
Census data itself can be found at American FactFinder. For example, you can get population estimates at the sub-county level (i.e. city/town), but not straight-forward population estimates at the zip-code level. I don't know the details of your data set, but one solution might require the use of relationship tables that are also available as part of the TIGER/Line data, or alternatively spatially joining the place names containing the census data (subcounty shapefiles) with the ZCTA5 codes.
Note from the metadata: "These products are free to use in a product or publication, however acknowledgement must be given to the U.S. Census Bureau as the source."
HTH
Upvotes: 0
Reputation: 44658
In short, no. Census to zip translations are generally created from proprietary sources.
It's unlikely that you'll find anything at the zipcode level from a census perspective (privacy). However, that doesn't mean you're left in the cold. You can use the zipcodes that you have and append census data from the MSA, muSA or CSA level. Now all you need is a listing of postal codes within your MSA, muSA or CSA so that you can merge. There's a bunch online that are pretty cheap if you don't already have such a list.
For example, in Canada, we can get income data from CRA at the FSA level (the first three digits of a postal code in the form A1A 1A1). I'm not sure what or if the IRS provides similar information, I'm also not too familiar with US Census data, but I imagine they provide information at the CSA level at the very least.
If you're bewildered by all these acronyms:
Upvotes: 6