Extract multiple columns from list of lists, and save in data.frame

Question

I have the following list:

library(rjson)
j <- fromJSON(file='https://esgf-data.dkrz.de/esg-search/search/?offset=0&limit=1000&type=Dataset&replica=false&latest=true&project=CORDEX&domain=EUR-11&experiment=rcp85&time_frequency=day&facets=rcm_name%2Cproject%2Cproduct%2Cdomain%2Cinstitute%2Cdriving_model%2Cexperiment%2Cexperiment_family%2Censemble%2Crcm_version%2Ctime_frequency%2Cvariable%2Cvariable_long_name%2Ccf_standard_name%2Cdata_node&format=application%2Fsolr%2Bjson')

I am interested in extracting data from this component: j$response$docs, which is a list of lists. The 'internal' lists are all supposed to have the same names.

I want to save the output to a data.frame() or tibble().

This below works and gives the desired output, for the few selected variables:

nmod <- length(j$response$docs)
for (i in 1:nmod) {
    #select one list at a time
    j1 <- j$response$docs[[i]]
    tmp <- data.frame(variable=j1$variable,
                        variable_long_name=j1$variable_long_name,
                        rcm_name=j1$rcm_name,
                        driving_model=j1$driving_model,
                        cf_standard_name=j1$cf_standard_name
                        )
    #join them
    if (i==1) {
        d <- tmp
    } else {
        d <- rbind(d, tmp)
    }
}

However, I'd like to know if there is a more elegant and efficient way, maybe using tidyr, dplyr or purrr, which also would allow me to select all ¨columns¨, instead of just the few selected there.

aosmith · Accepted Answer

You can do it with help from package purrr. I thought at_depth might work here, but instead I ended up using nested map_df.

library(purrr)

Your variables are different lengths, so the first thing to do is to make sure each variable is length 1. This can be done by collapsing each element of the inner list with paste. I used commas a separator. Doing this via map_df returns a 1 row tibble.

Here's an example with the first inner list.

map_df(j$response$docs[[1]], paste, collapse = ",")

Now we can loop through the outer lists, making a 1 row tibble for each. We use map_df to bind each of these together. The output is a 832 row tibble, one row per list. I used the .id argument to add a grouping variable to the result, which may not be needed.

d1 = map_df(j$response$docs, ~map_df(.x, paste, collapse = ","))
d1

# A tibble: 832 × 45
   group                                                                                                   id  version
                                                                                                       
1      1   cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.clh.v20131119|cordexesg.dmi.dk 20131119
2      2 cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.clivi.v20131119|cordexesg.dmi.dk 20131119
3      3  cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.rsds.v20131119|cordexesg.dmi.dk 20131119
4      4  cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.rlds.v20131119|cordexesg.dmi.dk 20131119
5      5  cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.rsus.v20131119|cordexesg.dmi.dk 20131119
6      6  cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.rlus.v20131119|cordexesg.dmi.dk 20131119
7      7  cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.rsdt.v20131119|cordexesg.dmi.dk 20131119
8      8  cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.rsut.v20131119|cordexesg.dmi.dk 20131119
9      9  cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.rlut.v20131119|cordexesg.dmi.dk 20131119
10    10   cordex.output.EUR-11.DMI.ICHEC-EC-EARTH.rcp85.r3i1p1.HIRHAM5.v1.day.psl.v20131119|cordexesg.dmi.dk 20131119
# ... with 822 more rows, and 42 more variables:

If you want to get multiple rows for the variables that were greater than length 1, such as access and experiment_family, you can use tidyr::separate_rows to separate the data onto multiple rows.

tidyr::separate_rows(d1, experiment_family)

Extract multiple columns from list of lists, and save in data.frame

Answers (2)

Related Questions