Reputation: 5335
I have a list of lists (of lists of lists...it's lists all the way down) called geos
with geolocation information for U.S. cities returned by the Google Maps API using the geocode()
function in ggmaps
(see dput
at the bottom of this question for a representative sample of data on 10 cities).
I would now like to use bits of this list to populate a data frame with one row per location, i.e., per element of the vector of locations used in the API query. For argument's sake, let's say I wanted the resulting data frame to include columns for locality
, administrative_area_level_2
(county), and administrative_area_level_1
(state), using long names for the first two and the short name for the last. Here's how the desired result would look.
locality administrative_area_level_2 administrative_area_level_1
1 Franconia Grafton County NH
2 Wausau Marathon County WI
3 Northfield Franklin County MA
4 South Bend St. Joseph County IN
5 Lanesboro Fillmore County MN
6 Cheboygan Cheboygan County MI
7 Chelmsford Middlesex County MA
8 Saint Clairsville Belmont County OH
9 New Hyde Park Nassau County NY
10 Jefferson Ashe County NC
All of the elements I want are in the address_components
sub-list, which I can isolate as follows.
library(dplyr)
library(purrr)
address_components <- geos %>%
map("results") %>%
map(1) %>%
map("address_components")
The tricky bit is that the resulting lists (now items 1 thru 10 in that new list called address_components
) have varying lengths; the elements of those lists aren't named; and the position of the bits I want changes with list length. Instead of names for the list elements, we have (of course) a list within each list element called types
that describes what that element is. So, for example, county might be the 2nd or 3rd or 4th element of address_components
, and wherever it is, we can recognize it because the types
sublist at that position includes the string "administrative_area_level_2"
as one of its elements.
Is there a way programmatically to extract certain elements from that list based on these attributes of other elements at their level? In pseudocode, to get the county name, for example, I'd write something like...
if ("administrative_area_level_2" %in% unlist(types)) return long_name
So how can I actually do this in R? Is there some SQL-driven solution to this problem? Or can it be done in the tidyverse
with some clever application of purrr
functionality?
As promised, here is a sample of the list I'm working with.
geos <- list(list(results = list(list(address_components = list(list(
long_name = "Franconia", short_name = "Franconia", types = list(
"locality", "political")), list(long_name = "Grafton County",
short_name = "Grafton County", types = list("administrative_area_level_2",
"political")), list(long_name = "New Hampshire", short_name = "NH",
types = list("administrative_area_level_1", "political")),
list(long_name = "United States", short_name = "US", types = list(
"country", "political"))), formatted_address = "Franconia, NH, USA",
geometry = list(bounds = list(northeast = list(lat = 44.2531679,
lng = -71.537367), southwest = list(lat = 44.112035,
lng = -71.786752)), location = list(lat = 44.2271729,
lng = -71.7479075), location_type = "APPROXIMATE", viewport = list(
northeast = list(lat = 44.2531679, lng = -71.537367),
southwest = list(lat = 44.112035, lng = -71.786752))),
place_id = "ChIJo86bzAl8tEwRtSTsEBwg1Gc", types = list("locality",
"political"))), status = "OK"), list(results = list(list(
address_components = list(list(long_name = "Wausau", short_name = "Wausau",
types = list("locality", "political")), list(long_name = "Marathon County",
short_name = "Marathon County", types = list("administrative_area_level_2",
"political")), list(long_name = "Wisconsin", short_name = "WI",
types = list("administrative_area_level_1", "political")),
list(long_name = "United States", short_name = "US",
types = list("country", "political"))), formatted_address = "Wausau, WI, USA",
geometry = list(bounds = list(northeast = list(lat = 45.006429,
lng = -89.573319), southwest = list(lat = 44.918368,
lng = -89.7482299)), location = list(lat = 44.9591352,
lng = -89.6301221), location_type = "APPROXIMATE", viewport = list(
northeast = list(lat = 45.006429, lng = -89.573319),
southwest = list(lat = 44.918368, lng = -89.7482299))),
place_id = "ChIJg0go-J0nAIgRXIvo6NhaKQM", types = list("locality",
"political"))), status = "OK"), list(results = list(list(
address_components = list(list(long_name = "Northfield",
short_name = "Northfield", types = list("locality", "political")),
list(long_name = "Franklin County", short_name = "Franklin County",
types = list("administrative_area_level_2", "political")),
list(long_name = "Massachusetts", short_name = "MA",
types = list("administrative_area_level_1", "political")),
list(long_name = "United States", short_name = "US",
types = list("country", "political"))), formatted_address = "Northfield, MA, USA",
geometry = list(bounds = list(northeast = list(lat = 42.7285309,
lng = -72.377039), southwest = list(lat = 42.604405,
lng = -72.5167739)), location = list(lat = 42.6959093,
lng = -72.4528885), location_type = "APPROXIMATE", viewport = list(
northeast = list(lat = 42.7285309, lng = -72.377039),
southwest = list(lat = 42.604405, lng = -72.5167739))),
place_id = "ChIJ736z8Aw84YkRj0BUEm0QZgE", types = list("locality",
"political"))), status = "OK"), list(results = list(list(
address_components = list(list(long_name = "South Bend",
short_name = "South Bend", types = list("locality", "political")),
list(long_name = "Portage Township", short_name = "Portage Township",
types = list("administrative_area_level_3", "political")),
list(long_name = "St. Joseph County", short_name = "St Joseph County",
types = list("administrative_area_level_2", "political")),
list(long_name = "Indiana", short_name = "IN", types = list(
"administrative_area_level_1", "political")), list(
long_name = "United States", short_name = "US", types = list(
"country", "political"))), formatted_address = "South Bend, IN, USA",
geometry = list(bounds = list(northeast = list(lat = 41.752098,
lng = -86.1912859), southwest = list(lat = 41.5973428,
lng = -86.3604831)), location = list(lat = 41.6763545,
lng = -86.2519898), location_type = "APPROXIMATE", viewport = list(
northeast = list(lat = 41.752098, lng = -86.1912859),
southwest = list(lat = 41.5973428, lng = -86.3604831))),
place_id = "ChIJE9NhSsQyEYgRBDKjb7PZSpc", types = list("locality",
"political"))), status = "OK"), list(results = list(list(
address_components = list(list(long_name = "Lanesboro", short_name = "Lanesboro",
types = list("locality", "political")), list(long_name = "Holt Township",
short_name = "Holt Township", types = list("administrative_area_level_3",
"political")), list(long_name = "Fillmore County",
short_name = "Fillmore County", types = list("administrative_area_level_2",
"political")), list(long_name = "Minnesota", short_name = "MN",
types = list("administrative_area_level_1", "political")),
list(long_name = "United States", short_name = "US",
types = list("country", "political")), list(long_name = "55949",
short_name = "55949", types = list("postal_code"))),
formatted_address = "Lanesboro, MN 55949, USA", geometry = list(
bounds = list(northeast = list(lat = 43.7312198, lng = -91.9545843),
southwest = list(lat = 43.7060355, lng = -91.9844293)),
location = list(lat = 43.7187813, lng = -91.9759204),
location_type = "APPROXIMATE", viewport = list(northeast = list(
lat = 43.7312198, lng = -91.9545843), southwest = list(
lat = 43.7060355, lng = -91.9844293))), place_id = "ChIJr2SDMZco-ocRb_dB0eZDTLU",
types = list("locality", "political"))), status = "OK"),
list(results = list(list(address_components = list(list(long_name = "Cheboygan",
short_name = "Cheboygan", types = list("locality", "political")),
list(long_name = "Cheboygan County", short_name = "Cheboygan County",
types = list("administrative_area_level_2", "political")),
list(long_name = "Michigan", short_name = "MI", types = list(
"administrative_area_level_1", "political")), list(
long_name = "United States", short_name = "US", types = list(
"country", "political")), list(long_name = "49721",
short_name = "49721", types = list("postal_code"))),
formatted_address = "Cheboygan, MI 49721, USA", geometry = list(
bounds = list(northeast = list(lat = 45.669849, lng = -84.4330271),
southwest = list(lat = 45.6198179, lng = -84.4984899)),
location = list(lat = 45.6469563, lng = -84.4744795),
location_type = "APPROXIMATE", viewport = list(northeast = list(
lat = 45.669849, lng = -84.4330271), southwest = list(
lat = 45.6198179, lng = -84.4984899))), place_id = "ChIJywA0rYKiNU0R6yCfyEI79dI",
types = list("locality", "political"))), status = "OK"),
list(results = list(list(address_components = list(list(long_name = "Chelmsford",
short_name = "Chelmsford", types = list("locality", "political")),
list(long_name = "Middlesex County", short_name = "Middlesex County",
types = list("administrative_area_level_2", "political")),
list(long_name = "Massachusetts", short_name = "MA",
types = list("administrative_area_level_1", "political")),
list(long_name = "United States", short_name = "US",
types = list("country", "political"))), formatted_address = "Chelmsford, MA, USA",
geometry = list(bounds = list(northeast = list(lat = 42.653754,
lng = -71.2942208), southwest = list(lat = 42.5496288,
lng = -71.4178121)), location = list(lat = 42.5998139,
lng = -71.3672838), location_type = "APPROXIMATE",
viewport = list(northeast = list(lat = 42.653754,
lng = -71.2942208), southwest = list(lat = 42.5496288,
lng = -71.4178121))), place_id = "ChIJx0tLqRej44kRi__M1sjNzjc",
types = list("locality", "political"))), status = "OK"),
list(results = list(list(address_components = list(list(long_name = "Saint Clairsville",
short_name = "St Clairsville", types = list("locality",
"political")), list(long_name = "Richland Township",
short_name = "Richland Township", types = list("administrative_area_level_3",
"political")), list(long_name = "Belmont County",
short_name = "Belmont County", types = list("administrative_area_level_2",
"political")), list(long_name = "Ohio", short_name = "OH",
types = list("administrative_area_level_1", "political")),
list(long_name = "United States", short_name = "US",
types = list("country", "political")), list(long_name = "43950",
short_name = "43950", types = list("postal_code"))),
formatted_address = "St Clairsville, OH 43950, USA",
geometry = list(bounds = list(northeast = list(lat = 40.097176,
lng = -80.8753491), southwest = list(lat = 40.0569829,
lng = -80.9266679)), location = list(lat = 40.0803199,
lng = -80.90176), location_type = "APPROXIMATE",
viewport = list(northeast = list(lat = 40.097176,
lng = -80.8753491), southwest = list(lat = 40.0569829,
lng = -80.9266679))), place_id = "ChIJD9-5fMFwNogRmDV43jTEVS0",
types = list("locality", "political"))), status = "OK"),
list(results = list(list(address_components = list(list(long_name = "New Hyde Park",
short_name = "New Hyde Park", types = list("locality",
"political")), list(long_name = "North Hempstead",
short_name = "North Hempstead", types = list("administrative_area_level_3",
"political")), list(long_name = "Nassau County",
short_name = "Nassau County", types = list("administrative_area_level_2",
"political")), list(long_name = "New York", short_name = "NY",
types = list("administrative_area_level_1", "political")),
list(long_name = "United States", short_name = "US",
types = list("country", "political"))), formatted_address = "New Hyde Park, NY, USA",
geometry = list(bounds = list(northeast = list(lat = 40.7419718,
lng = -73.6748929), southwest = list(lat = 40.7233181,
lng = -73.69721)), location = list(lat = 40.7351018,
lng = -73.6879082), location_type = "APPROXIMATE",
viewport = list(northeast = list(lat = 40.7419718,
lng = -73.6748929), southwest = list(lat = 40.7233181,
lng = -73.69721))), place_id = "ChIJOfwQ1pJiwokRQIZrHiBxJbA",
types = list("locality", "political"))), status = "OK"),
list(results = list(list(address_components = list(list(long_name = "Jefferson",
short_name = "Jefferson", types = list("locality", "political")),
list(long_name = "Jefferson", short_name = "Jefferson",
types = list("administrative_area_level_3", "political")),
list(long_name = "Ashe County", short_name = "Ashe County",
types = list("administrative_area_level_2", "political")),
list(long_name = "North Carolina", short_name = "NC",
types = list("administrative_area_level_1", "political")),
list(long_name = "United States", short_name = "US",
types = list("country", "political")), list(long_name = "28640",
short_name = "28640", types = list("postal_code"))),
formatted_address = "Jefferson, NC 28640, USA", geometry = list(
bounds = list(northeast = list(lat = 36.430581, lng = -81.422682),
southwest = list(lat = 36.404752, lng = -81.4894969)),
location = list(lat = 36.420403, lng = -81.4734376),
location_type = "APPROXIMATE", viewport = list(northeast = list(
lat = 36.430581, lng = -81.422682), southwest = list(
lat = 36.404752, lng = -81.4894969))), place_id = "ChIJJfTHvEasUYgRsEKY3vcTFgc",
types = list("locality", "political"))), status = "OK"))
Upvotes: 2
Views: 228
Reputation: 5335
After a lot of trial and error, I ended up figuring out how to do this with some help from the pluck()
and keep()
functions from purrr
in particular. I wrote a function that lets me set the attribute I'm after, then used map_dfc()
to iterate that function over the three attributes in my desired output: locality name, county name, and state name.
library(tidyverse)
geo_extractor <- function(api_output, attribute, version = 'long_name') {
api_output %>%
map(., ~purrr::pluck(., 'results', 1, 'address_components')) %>%
map(., ~keep(., grepl(attribute, .))) %>%
map_chr(., ~purrr::pluck(., 1, version))
}
desiderata <- c("locality", "level_2", "level_1")
dat <- setNames(map_dfc(desiderata, ~geo_extractor(geos, .)), desiderata)
Here's how the result looks.
> dat
# A tibble: 10 x 3
locality level_2 level_1
<chr> <chr> <chr>
1 Franconia Grafton County New Hampshire
2 Wausau Marathon County Wisconsin
3 Northfield Franklin County Massachusetts
4 South Bend St. Joseph County Indiana
5 Lanesboro Fillmore County Minnesota
6 Cheboygan Cheboygan County Michigan
7 Chelmsford Middlesex County Massachusetts
8 Saint Clairsville Belmont County Ohio
9 New Hyde Park Nassau County New York
10 Jefferson Ashe County North Carolina
I know from solving a related version of this problem a slightly different way that this function will probably fail if the API output (here, geos
) includes results for locations that couldn't be resolved or that don't include one or more of the attributes you're seeking (e.g., no county). I also know that you can work around that problem with some properly placed if/else constructs. That's not an issue in this toy example, however, so I'll declare victory for this question and move on.
Upvotes: 1
Reputation: 79228
You could do: There are many more columns
stack(unlist(setNames(address_components,1:10)))%>%
separate(ind,c("grp","nm"),"[.]")%>%
group_by(grp,id = cumsum(str_detect(nm,"long_name")))%>%
pivot_wider(c(id,grp),nm,values_from = values)%>%
pivot_wider(grp,c(types1,types2,types),values_from = long_name)
# A tibble: 10 x 7
# Groups: grp [10]
grp locality_politic~ administrative_a~ administrative_~ country_politic~ administrative_~ NA_NA_postal_co~
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Franconia Grafton County New Hampshire United States NA NA
2 2 Wausau Marathon County Wisconsin United States NA NA
3 3 Northfield Franklin County Massachusetts United States NA NA
4 4 South Bend St. Joseph County Indiana United States Portage Township NA
5 5 Lanesboro Fillmore County Minnesota United States Holt Township 55949
6 6 Cheboygan Cheboygan County Michigan United States NA 49721
7 7 Chelmsford Middlesex County Massachusetts United States NA NA
8 8 Saint Clairsville Belmont County Ohio United States Richland Townsh~ 43950
9 9 New Hyde Park Nassau County New York United States North Hempstead NA
10 10 Jefferson Ashe County North Carolina United States Jefferson 28640
or if you want the short names:
stack(unlist(setNames(address_components,1:10)))%>%
separate(ind,c("grp","nm"),"[.]")%>%
group_by(grp,id = cumsum(str_detect(nm,"long_name")))%>%
pivot_wider(c(id,grp),nm,values_from = values)%>%
pivot_wider(grp,c(types1,types2,types),values_from = short_name)
# A tibble: 10 x 7
# Groups: grp [10]
grp locality_politic~ administrative_a~ administrative_~ country_politic~ administrative_~ NA_NA_postal_co~
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Franconia Grafton County NH US NA NA
2 2 Wausau Marathon County WI US NA NA
3 3 Northfield Franklin County MA US NA NA
4 4 South Bend St Joseph County IN US Portage Township NA
5 5 Lanesboro Fillmore County MN US Holt Township 55949
6 6 Cheboygan Cheboygan County MI US NA 49721
7 7 Chelmsford Middlesex County MA US NA NA
8 8 St Clairsville Belmont County OH US Richland Townsh~ 43950
9 9 New Hyde Park Nassau County NY US North Hempstead NA
10 10 Jefferson Ashe County NC US Jefferson 28640
Upvotes: 1
Reputation:
I don't think I solved you all the way there, but it seems like there are several things you would want to do with it.
Does unnesting and coding it as such do what you would like? From here it can be just a bunch of filters and pivots using standard dplyr
and tidyr
things.
Each record from the original nested list is identified by grouping on record
and record2
.
library(dplyr)
library(purrr)
library(tibble)
address_long <- address_components %>%
map_dfr(~ set_names(.x, seq.int(length(.x))), .id = "record") %>%
pivot_longer(-record, names_to = "record2") %>%
mutate(name = names(value)) %>%
mutate(value = simplify_all(value)) %>%
unnest(value) %>%
rowid_to_column()
col_types <- address_long %>%
filter(name == "types",
value != "political") %>%
select(record, record2, type = value)
address_long %>%
filter(name != "types") %>%
left_join(col_types, by = c("record", "record2"))
# # A tibble: 98 x 6
# rowid record record2 value name type
# <int> <chr> <chr> <chr> <chr> <chr>
# 1 1 1 1 Franconia long_name locality
# 2 2 1 2 Grafton County long_name administrative_area_level_2
# 3 3 1 3 New Hampshire long_name administrative_area_level_1
# 4 4 1 4 United States long_name country
# 5 5 1 1 Franconia short_name locality
# 6 6 1 2 Grafton County short_name administrative_area_level_2
# 7 7 1 3 NH short_name administrative_area_level_1
# 8 8 1 4 US short_name country
# 9 17 2 1 Wausau long_name locality
# 10 18 2 2 Marathon County long_name administrative_area_level_2
# # ... with 88 more rows
In your example, you would want to filter value to
Upvotes: 0