Eric Green
Eric Green

Reputation: 7725

unnest_auto and unnest_longer to unnest multiple columns

I have a nested dataframe that I'm trying to unnest. Here's a fake example of the structure.

df <- structure(list(`_id` = c("a", "b", "c", "d"), 
                     variable = list(structure(list(type = c("u", "a", "u", "a", "u", "a", "a"), 
                                                    m_ = c("m1",
                                                           "m2",
                                                           "m3",
                                                           "m4",
                                                           "m5",
                                                           "m6", "m7"), #omitted from original example by mistake 
                                                    t_ = c("2015-07-21 4:13 PM", 
                                                           "2016-04-21 7:25 PM", 
                                                           "2017-10-04 9:49 PM", 
                                                           "2018-12-04 12:29 PM", 
                                                           "2019-04-20 20:20 AM", 
                                                           "2016-05-20 12:00 AM", 
                                                           "2016-06-20 12:00 AM"), 
                                                    a_ = c(NA, 
                                                           "", 
                                                           NA, 
                                                           "", 
                                                           NA, 
                                                           "C", 
                                                           "C")), 
                                               class = "data.frame", 
                                               row.names = c(NA, 7L)), 
                                     structure(list(type = c("u", "a"), 
                                                    m_ = c("m1",
                                                           "m2"), 
                                                    t_ = c("2018-05-24 12:08 AM", 
                                                           "2019-04-24 3:05 PM"), 
                                                    a_ = c(NA, "")), 
                                               class = "data.frame", 
                                               row.names = 1:2), 
                                     structure(list(type = "u", 
                                                    m_ = "m1", 
                                                    t_ = "2018-02-17 3:14 PM"), 
                                               class = "data.frame", 
                                               row.names = 1L), 
                                     structure(list(type = "u", 
                                                    m_ = "m1",
                                                    t_ = "2016-05-27 5:14 PM",
                                                    b_ = "b1", 
                                                    i_ = "i1", 
                                                    e_ = structure(list(), 
                                                                   .Names = character(0), 
                                                                   class = "data.frame", 
                                                                   row.names = c(NA, -1L)), 
                                                    l_ = "l1"), 
                                               class = "data.frame", 
                                               row.names = 1L)),
                     myDate = structure(c(1521503311.992, 
                                          1521514011.161, 
                                          1551699584.65, 
                                          1553632693.94), 
                                        class = c("POSIXct", "POSIXt"))), 
                row.names = c(1L, 2L, 3L, 4L), 
                class = "data.frame")
View(df)

enter image description here

variable is a list of dataframes that vary in length (max fields is 7 in this example, but can expand over time).

I tried using the development version of tidyr to take advantage of the new unnest_auto() function.

# devtools::install_github("tidyverse/tidyr")
df2 <- unnest_auto(df, variable)
View(df2)

enter image description here

If I use unnest_longer on the result and specify one column like type I get it to expand.

df3 <- unnest_longer(df2, type)

enter image description here

I don't see any arguments to unnest_longer() that handle multiple columns. Is there a better approach?

Upvotes: 2

Views: 2565

Answers (2)

hadley
hadley

Reputation: 103898

Here, since you're unnesting a two dimensional structure (i.e. you want to change both the rows and columns), you can just use unnest:

library(tidyr)

df <- as_tibble(df)
df
#> # A tibble: 4 × 3
#>   `_id` variable     myDate             
#>   <chr> <list>       <dttm>             
#> 1 a     <df [7 × 4]> 2018-03-19 18:48:31
#> 2 b     <df [2 × 4]> 2018-03-19 21:46:51
#> 3 c     <df [1 × 3]> 2019-03-04 05:39:44
#> 4 d     <df [1 × 7]> 2019-03-26 15:38:13

df |> 
  unnest(variable) 
#> # A tibble: 11 × 10
#>    `_id` type  m_    t_        a_    b_    i_    e_    l_    myDate             
#>    <chr> <chr> <chr> <chr>     <chr> <chr> <chr> <df[> <chr> <dttm>             
#>  1 a     u     m1    2015-07-…  <NA> <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  2 a     a     m2    2016-04-… ""    <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  3 a     u     m3    2017-10-…  <NA> <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  4 a     a     m4    2018-12-… ""    <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  5 a     u     m5    2019-04-…  <NA> <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  6 a     a     m6    2016-05-… "C"   <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  7 a     a     m7    2016-06-… "C"   <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  8 b     u     m1    2018-05-…  <NA> <NA>  <NA>        <NA>  2018-03-19 21:46:51
#>  9 b     a     m2    2019-04-… ""    <NA>  <NA>        <NA>  2018-03-19 21:46:51
#> 10 c     u     m1    2018-02-…  <NA> <NA>  <NA>        <NA>  2019-03-04 05:39:44
#> 11 d     u     m1    2016-05-…  <NA> b1    i1          l1    2019-03-26 15:38:13

If you did want to do it in two steps, you could take advantage of the fact that unnest_longer() now takes a tidyselect specification:

df |> 
  unnest_wider(variable) |> 
  unnest_longer(type:a_)
#> # A tibble: 11 × 10
#>    `_id` type  m_    t_        a_    b_    i_    e_    l_    myDate             
#>    <chr> <chr> <chr> <chr>     <chr> <chr> <chr> <df[> <chr> <dttm>             
#>  1 a     u     m1    2015-07-…  <NA> <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  2 a     a     m2    2016-04-… ""    <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  3 a     u     m3    2017-10-…  <NA> <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  4 a     a     m4    2018-12-… ""    <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  5 a     u     m5    2019-04-…  <NA> <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  6 a     a     m6    2016-05-… "C"   <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  7 a     a     m7    2016-06-… "C"   <NA>  <NA>        <NA>  2018-03-19 18:48:31
#>  8 b     u     m1    2018-05-…  <NA> <NA>  <NA>        <NA>  2018-03-19 21:46:51
#>  9 b     a     m2    2019-04-… ""    <NA>  <NA>        <NA>  2018-03-19 21:46:51
#> 10 c     u     m1    2018-02-…  <NA> <NA>  <NA>        <NA>  2019-03-04 05:39:44
#> 11 d     u     m1    2016-05-…  <NA> b1    i1          l1    2019-03-26 15:38:13

Upvotes: 9

Eric Green
Eric Green

Reputation: 7725

This appears to work:

df %>% unnest_auto(variable) %>% unnest()

#Warning message:
#`cols` is now required.
#Please use `cols = c(type, m_, t_, a_, e_)`

df %>% unnest_auto(variable) %>% unnest(cols = c(type, m_, t_, a_, e_, l_))

# A tibble: 11 x 10
   `_id` type  m_    t_     a_    b_    i_    e_    l_    myDate             
   <chr> <chr> <chr> <chr>  <chr> <chr> <chr> <???> <chr> <dttm>             
 1 a     u     m1    2015-… NA    NA    NA    NA    NA    2018-03-20 02:48:31
 2 a     a     m2    2016-… ""    NA    NA    NA    NA    2018-03-20 02:48:31
 3 a     u     m3    2017-… NA    NA    NA    NA    NA    2018-03-20 02:48:31
 4 a     a     m4    2018-… ""    NA    NA    NA    NA    2018-03-20 02:48:31
 5 a     u     m5    2019-… NA    NA    NA    NA    NA    2018-03-20 02:48:31
 6 a     a     m6    2016-… C     NA    NA    NA    NA    2018-03-20 02:48:31
 7 a     a     m7    2016-… C     NA    NA    NA    NA    2018-03-20 02:48:31
 8 b     u     m1    2018-… NA    NA    NA    NA    NA    2018-03-20 05:46:51
 9 b     a     m2    2019-… ""    NA    NA    NA    NA    2018-03-20 05:46:51
10 c     u     m1    2018-… NA    NA    NA    NA    NA    2019-03-04 14:39:44
11 d     u     m1    2016-… NA    b1    i1    NA    l1    2019-03-26 23:38:13

Upvotes: 3

Related Questions