Reputation: 262
I am mapping over a series of entries at level(x) of a json. For each level x, there are nested levels (x+1) containing some information that I want to combine into a data frame along with some information from x.
This is a toy example I'm using to learn purrr
and handling json
in R.
E.g.
(entry) <- level x
(year: 2016) <- want this
(category: "physics") <- want this
(winners)
(1) <- level x+1
(name: "bob" ) <- want this
(id: ) <- want this
(2..n) <- level x+1
(name: "steve" ) <- want this
(id: ) <- want this
To make a dataframe:
name id year category
bob 1 2016 physics
steve 2 2016 physics
mel 3 2016 chemistry .. etc
I have this solved but it's using a nested map
on every level of x and is very brittle:
library(purr)
library(tidyverse)
library(stringr)
library(jsonlite)
# get example data
winners <- fromJSON("http://api.nobelprize.org/v1/prize.json", simplifyDataFrame=FALSE)
x <- winners$prizes %>%
map_df(function(prize) {
map_df(prize$laureates, function(person) {
tibble(id = person$id, firstname = person$firstname,
surname=ifelse(!is.null(person$surname),
person$surname, NA),
category=prize$category, year=prize$year)
})
})
Is there a better way to be doing this? Concerns w/ above code:
purrr
functions that I'm unaware of that I can use? ifelse
it would fail. Technically, I should be wrapping everything inside the tibble
call with ifelse
but then it becomes extremely verbose and doesn't feel like the right solution. Upvotes: 0
Views: 953
Reputation: 78792
What you did was — as they say in New England — perfectly fine, esp since it resulted in a working solution that was readable by other folks (i.e. the two most important things).
This is the approach I'd take (it's only slightly different):
winners <- fromJSON("http://api.nobelprize.org/v1/prize.json", simplifyDataFrame=FALSE)
extract_laureates <- function(x) {
surname <- NULL
map_df(x$laureates, flatten_df) %>%
mutate(name=paste(firstname, surname, sep=" "),
year=x$year,
category=x$category) %>%
select(name, id, year, category)
}
map_df(winners$prizes, extract_laureates)
## # A tibble: 911 × 4
## name id year category
## <chr> <chr> <chr> <chr>
## 1 David J. Thouless 928 2016 physics
## 2 F. Duncan M. Haldane 929 2016 physics
## 3 J. Michael Kosterlitz 930 2016 physics
## 4 Jean-Pierre Sauvage 931 2016 chemistry
## 5 Sir J. Fraser Stoddart 932 2016 chemistry
## 6 Bernard L. Feringa 933 2016 chemistry
## 7 Yoshinori Ohsumi 927 2016 medicine
## 8 Bob Dylan 937 2016 literature
## 9 Juan Manuel Santos 934 2016 peace
## 10 Oliver Hart 935 2016 economics
## # ... with 901 more rows
Unless I'm writing a quick hack that I'm pretty sure I'll never use again, I like to make non-anonymous functions since it helps when breaking down the logic/steps.
You can use the scoping rules of R to simplify the ifelse()
by declaring a variable with the same name as the column. If dplyr
finds a column with that name it'll use it. If not, R will use the local variable.
Then, we add the year and category to the new data_frame
and select()
out what you wanted.
To address your specific questions:
map…()
calls. Even if I could come up with one, it'd prbly look like an ugly, unreadable hack (remember, you're writing code for humans).Another option is to wait until the "filled" data.frame is built then do the name
processing:
extract_laureates <- function(x) {
map_df(x$laureates, flatten_df) %>%
mutate(year=x$year, category=x$category)
}
map_df(winners$prizes, extract_laureates) %>%
mutate(surname=ifelse(is.na(surname), I(NULL), surname),
name=paste(firstname, surname, sep=" ")) %>%
select(name, id, year, category) %>% View()
Upvotes: 1