Emman
Emman

Reputation: 4201

Convert nested list to dataframe: extract only specific elements of interest

I've seen many similar questions, but couldn't adapt to my situation. I have data that comes as a nested list, and want to convert it to a data frame in a certain way.

my_data_object <-
  list(my_variables = list(
    age = list(
      type = "numeric",
      originType = "slider",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 5L,
      title = "what is your age?",
      valueDescriptions = NULL
    ),
    med_field = list(
      type = "string",
      originType = "choice",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 6L,
      title = "what medical branch are you at?",
      valueDescriptions = list(card = "Cardiology", ophth = "Ophthalmology",
                               derm = "Dermatology")
    ),
    covid_vaccine = list(
      type = "string",
      originType = "choice",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 8L,
      title = "when do you plan to get vaccinated?",
      valueDescriptions = list(
        next_mo = "No later than next month",
        within_six_mo = "No later than six months from now",
        never = "I will not get vaccinated"
      )
    )
  ))

Desired Output

  var_name      type    originType title                              
  <chr>         <chr>   <chr>      <chr>                              
1 age           numeric slider     what is your age?                  
2 med_field     string  choice     what medical branch are you at?    
3 covid_vaccine string  choice     when do you plan to get vaccinated?

My unsuccessful attempt

library(tibble)
library(tidyr)

my_data_object %>% 
  enframe() %>% 
  unnest_longer(value) %>% 
  unnest(value)

## # A tibble: 18 x 3
##    name         value            value_id     
##    <chr>        <named list>     <chr>        
##  1 my_variables <chr [1]>        age          
##  2 my_variables <chr [1]>        age          
##  3 my_variables <named list [0]> age          
##  4 my_variables <int [1]>        age          
##  5 my_variables <chr [1]>        age          
##  6 my_variables <NULL>           age          
##  7 my_variables <chr [1]>        med_field    
##  8 my_variables <chr [1]>        med_field    
##  9 my_variables <named list [0]> med_field    
## 10 my_variables <int [1]>        med_field    
## 11 my_variables <chr [1]>        med_field    
## 12 my_variables <named list [3]> med_field    
## 13 my_variables <chr [1]>        covid_vaccine
## 14 my_variables <chr [1]>        covid_vaccine
## 15 my_variables <named list [0]> covid_vaccine
## 16 my_variables <int [1]>        covid_vaccine
## 17 my_variables <chr [1]>        covid_vaccine
## 18 my_variables <named list [3]> covid_vaccine

I'm trying to get this using tidyverse functions, but so far it seems I'm not headed the right direction. I Will be grateful for guidance.

EDIT

Unlike the example data I provided originally, in reality my data comes in a bit different hierarchy. I thought this would be simple to generalize once I have the method but turns it's not. So if we consider that data comes such as the following, but truly I only care about the my_variables sub-list.

my_data_object_2 <-
  list(
  other_variables = list(
    whatever_var_1 = list(
      type = "numeric",
      originType = "slider",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 5L,
      title = "blah question",
      valueDescriptions = NULL
    )
  ),
  my_variables = list(
    age = list(
      type = "numeric",
      originType = "slider",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 5L,
      title = "what is your age?",
      valueDescriptions = NULL
    ),
    med_field = list(
      type = "string",
      originType = "choice",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 6L,
      title = "what medical branch are you at?",
      valueDescriptions = list(card = "Cardiology", ophth = "Ophthalmology",
                               derm = "Dermatology")
    ),
    covid_vaccine = list(
      type = "string",
      originType = "choice",
      originSettings = structure(list(), .Names = character(0)),
      originIndex = 8L,
      title = "when do you plan to get vaccinated?",
      valueDescriptions = list(
        next_mo = "No later than next month",
        within_six_mo = "No later than six months from now",
        never = "I will not get vaccinated"
      )
    )
  )
)

So how could I "zoom in"/"extract" my_variables and only then get the table I specified in "Desired Output" above?

Upvotes: 2

Views: 361

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269854

Iterate over my_data_object tibblifying the indicated columns and putting it all together using map_dfr (or maybe fun(my_data_object$my_variables) is sufficient depnding on what the general case is). There are no missing fields in the example data but if any of the 3 spec fields can be missing then add .default = NA as an lcol_chr argument to that field spec.

library(purrr)
library(tibblify)

spec <-  lcols(
  lcol_chr("type"),
  lcol_chr("originType"),
  lcol_chr("title")
)
fun <- function(x) cbind(var_name = names(x), tibblify(x, spec))

map_dfr(my_data_object, fun)

giving:

       var_name    type originType                               title
1           age numeric     slider                   what is your age?
2     med_field  string     choice     what medical branch are you at?
3 covid_vaccine  string     choice when do you plan to get vaccinated?

Depending on what the general case is this simplification by @mgirlich (which is similar to the alternative in the introduction to this answer) may work. spec is from above.

library(tibblify)

cbind(
  var_name = names(my_data_object[[1]]),
  tibblify(my_data_object[[1]], spec)
)

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389125

You can flatten the object, use enframe and unnest_wider to create new columns.

library(tidyverse)

my_data_object %>% 
  flatten() %>%
  tibble::enframe() %>%
  unnest_wider(value)
  
#  name          type    originType originIndex title                               valueDescriptions
#  <chr>         <chr>   <chr>            <int> <chr>                               <list>           
#1 age           numeric slider               5 what is your age?                   <NULL>           
#2 med_field     string  choice               6 what medical branch are you at?     <named list [3]> 
#3 covid_vaccine string  choice               8 when do you plan to get vaccinated? <named list [3]> 

You can then drop the columns that you don't need.


To use only my_data_object_2$my_variables :

my_data_object_2$my_variables %>%
  tibble::enframe() %>%
  unnest_wider(value)

Upvotes: 2

jay.sf
jay.sf

Reputation: 73252

Using lapply as usual to select specific columns, just rbind them.

res <- do.call(rbind.data.frame, 
               lapply((my_data_object)[[1]], `[`, c("type", "originType", "title")))
res
#                  type originType                               title
# age           numeric     slider                   what is your age?
# med_field      string     choice     what medical branch are you at?
# covid_vaccine  string     choice when do you plan to get vaccinated?

If you want row names to first column, do:

`rownames<-`(cbind(var=rownames(res), res), NULL)
#             var    type originType                               title
# 1           age numeric     slider                   what is your age?
# 2     med_field  string     choice     what medical branch are you at?
# 3 covid_vaccine  string     choice when do you plan to get vaccinated?

Upvotes: 2

Related Questions