VFreguglia
VFreguglia

Reputation: 2311

How to deal with lists of lists when the first index represents rows?

How can I convert a list of list, to a DataFrame, where the first "layer" of lists should be rows?

myList = list(
    list(name="name1",num=20,dogs=list("dog1")),
    list(name="name2",num=13,dogs = list()),
    list(name="name3",num=5,dogs=list("dog2","dog4"))
)

My first idea was to unlist the elements in the "third layer"

myUnList = sapply(myList,function(x){y=x;y$dogs = unlist(y$dogs);y})

I can create a tibble

tibble(myUnList)
# A tibble: 3 x 1
    myUnList
      <list>
1 <list [3]>
2 <list [2]>
3 <list [3]>

Note that, if I had myList[[1]] to represent the vector of name, it would be simple, but I'm having trouble on how to tidy the data presented the other way. I though about using purrr to "invert" the order.

Expected result:

# A tibble: 3 x 3
      names       num       dogs
     <list>    <list>     <list>
1 <chr [1]> <dbl [1]> <list [1]>
2 <chr [1]> <dbl [1]> <list [0]>
3 <chr [1]> <dbl [1]> <list [2]>

Are there other type of data structure that supports varying length entries?

Upvotes: 1

Views: 46

Answers (2)

VFreguglia
VFreguglia

Reputation: 2311

After some time playing around with purrr, I got another solution that doesn't requires typing the names (could be troublesome for really large lists).

myList %>% transpose %>% simplify_all %>% tbl_df

Results in

# A tibble: 3 x 3
   name   num       dogs
  <chr> <dbl>     <list>
1 name1    20 <list [1]>
2 name2    13 <list [0]>
3 name3     5 <list [2]>

The transpose function from purrr makes this type of conversion automatically.

Upvotes: 1

www
www

Reputation: 39154

We can extract the list element by using map function from the purrr package and then create a new tibble using data_frame.

library(tidyverse)

dat <- data_frame(name = map_chr(myList, "name"),
                  num = map_dbl(myList, "num"),
                  dogs = map(myList, "dogs"))
dat
# # A tibble: 3 x 3
#    name    num dogs      
#   <chr> <dbl> <list>    
# 1 name1 20.0  <list [1]>
# 2 name2 13.0  <NULL>    
# 3 name3  5.00 <list [2]>

And if you prefer everything to be in list column, replace map_chr and map_dbl with map.

dat <- data_frame(name = map(myList, "name"),
                  num = map(myList, "num"),
                  dogs = map(myList, "dogs"))
dat
#   name      num       dogs      
#   <list>    <list>    <list>    
# 1 <chr [1]> <dbl [1]> <list [1]>
# 2 <chr [1]> <dbl [1]> <NULL>    
# 3 <chr [1]> <dbl [1]> <list [2]>

Upvotes: 1

Related Questions