Reputation: 7725
I have a nested dataframe that I'm trying to unnest. Here's a fake example of the structure.
df <- structure(list(`_id` = c("a", "b", "c", "d"),
variable = list(structure(list(type = c("u", "a", "u", "a", "u", "a", "a"),
m_ = c("m1",
"m2",
"m3",
"m4",
"m5",
"m6", "m7"), #omitted from original example by mistake
t_ = c("2015-07-21 4:13 PM",
"2016-04-21 7:25 PM",
"2017-10-04 9:49 PM",
"2018-12-04 12:29 PM",
"2019-04-20 20:20 AM",
"2016-05-20 12:00 AM",
"2016-06-20 12:00 AM"),
a_ = c(NA,
"",
NA,
"",
NA,
"C",
"C")),
class = "data.frame",
row.names = c(NA, 7L)),
structure(list(type = c("u", "a"),
m_ = c("m1",
"m2"),
t_ = c("2018-05-24 12:08 AM",
"2019-04-24 3:05 PM"),
a_ = c(NA, "")),
class = "data.frame",
row.names = 1:2),
structure(list(type = "u",
m_ = "m1",
t_ = "2018-02-17 3:14 PM"),
class = "data.frame",
row.names = 1L),
structure(list(type = "u",
m_ = "m1",
t_ = "2016-05-27 5:14 PM",
b_ = "b1",
i_ = "i1",
e_ = structure(list(),
.Names = character(0),
class = "data.frame",
row.names = c(NA, -1L)),
l_ = "l1"),
class = "data.frame",
row.names = 1L)),
myDate = structure(c(1521503311.992,
1521514011.161,
1551699584.65,
1553632693.94),
class = c("POSIXct", "POSIXt"))),
row.names = c(1L, 2L, 3L, 4L),
class = "data.frame")
View(df)
variable
is a list of dataframes that vary in length (max fields is 7 in this example, but can expand over time).
I tried using the development version of tidyr
to take advantage of the new unnest_auto()
function.
# devtools::install_github("tidyverse/tidyr")
df2 <- unnest_auto(df, variable)
View(df2)
If I use unnest_longer on the result and specify one column like type
I get it to expand.
df3 <- unnest_longer(df2, type)
I don't see any arguments to unnest_longer()
that handle multiple columns. Is there a better approach?
Upvotes: 2
Views: 2565
Reputation: 103898
Here, since you're unnesting a two dimensional structure (i.e. you want to change both the rows and columns), you can just use unnest:
library(tidyr)
df <- as_tibble(df)
df
#> # A tibble: 4 × 3
#> `_id` variable myDate
#> <chr> <list> <dttm>
#> 1 a <df [7 × 4]> 2018-03-19 18:48:31
#> 2 b <df [2 × 4]> 2018-03-19 21:46:51
#> 3 c <df [1 × 3]> 2019-03-04 05:39:44
#> 4 d <df [1 × 7]> 2019-03-26 15:38:13
df |>
unnest(variable)
#> # A tibble: 11 × 10
#> `_id` type m_ t_ a_ b_ i_ e_ l_ myDate
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <df[> <chr> <dttm>
#> 1 a u m1 2015-07-… <NA> <NA> <NA> <NA> 2018-03-19 18:48:31
#> 2 a a m2 2016-04-… "" <NA> <NA> <NA> 2018-03-19 18:48:31
#> 3 a u m3 2017-10-… <NA> <NA> <NA> <NA> 2018-03-19 18:48:31
#> 4 a a m4 2018-12-… "" <NA> <NA> <NA> 2018-03-19 18:48:31
#> 5 a u m5 2019-04-… <NA> <NA> <NA> <NA> 2018-03-19 18:48:31
#> 6 a a m6 2016-05-… "C" <NA> <NA> <NA> 2018-03-19 18:48:31
#> 7 a a m7 2016-06-… "C" <NA> <NA> <NA> 2018-03-19 18:48:31
#> 8 b u m1 2018-05-… <NA> <NA> <NA> <NA> 2018-03-19 21:46:51
#> 9 b a m2 2019-04-… "" <NA> <NA> <NA> 2018-03-19 21:46:51
#> 10 c u m1 2018-02-… <NA> <NA> <NA> <NA> 2019-03-04 05:39:44
#> 11 d u m1 2016-05-… <NA> b1 i1 l1 2019-03-26 15:38:13
If you did want to do it in two steps, you could take advantage of the fact that unnest_longer()
now takes a tidyselect specification:
df |>
unnest_wider(variable) |>
unnest_longer(type:a_)
#> # A tibble: 11 × 10
#> `_id` type m_ t_ a_ b_ i_ e_ l_ myDate
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <df[> <chr> <dttm>
#> 1 a u m1 2015-07-… <NA> <NA> <NA> <NA> 2018-03-19 18:48:31
#> 2 a a m2 2016-04-… "" <NA> <NA> <NA> 2018-03-19 18:48:31
#> 3 a u m3 2017-10-… <NA> <NA> <NA> <NA> 2018-03-19 18:48:31
#> 4 a a m4 2018-12-… "" <NA> <NA> <NA> 2018-03-19 18:48:31
#> 5 a u m5 2019-04-… <NA> <NA> <NA> <NA> 2018-03-19 18:48:31
#> 6 a a m6 2016-05-… "C" <NA> <NA> <NA> 2018-03-19 18:48:31
#> 7 a a m7 2016-06-… "C" <NA> <NA> <NA> 2018-03-19 18:48:31
#> 8 b u m1 2018-05-… <NA> <NA> <NA> <NA> 2018-03-19 21:46:51
#> 9 b a m2 2019-04-… "" <NA> <NA> <NA> 2018-03-19 21:46:51
#> 10 c u m1 2018-02-… <NA> <NA> <NA> <NA> 2019-03-04 05:39:44
#> 11 d u m1 2016-05-… <NA> b1 i1 l1 2019-03-26 15:38:13
Upvotes: 9
Reputation: 7725
This appears to work:
df %>% unnest_auto(variable) %>% unnest()
#Warning message:
#`cols` is now required.
#Please use `cols = c(type, m_, t_, a_, e_)`
df %>% unnest_auto(variable) %>% unnest(cols = c(type, m_, t_, a_, e_, l_))
# A tibble: 11 x 10
`_id` type m_ t_ a_ b_ i_ e_ l_ myDate
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <???> <chr> <dttm>
1 a u m1 2015-… NA NA NA NA NA 2018-03-20 02:48:31
2 a a m2 2016-… "" NA NA NA NA 2018-03-20 02:48:31
3 a u m3 2017-… NA NA NA NA NA 2018-03-20 02:48:31
4 a a m4 2018-… "" NA NA NA NA 2018-03-20 02:48:31
5 a u m5 2019-… NA NA NA NA NA 2018-03-20 02:48:31
6 a a m6 2016-… C NA NA NA NA 2018-03-20 02:48:31
7 a a m7 2016-… C NA NA NA NA 2018-03-20 02:48:31
8 b u m1 2018-… NA NA NA NA NA 2018-03-20 05:46:51
9 b a m2 2019-… "" NA NA NA NA 2018-03-20 05:46:51
10 c u m1 2018-… NA NA NA NA NA 2019-03-04 14:39:44
11 d u m1 2016-… NA b1 i1 NA l1 2019-03-26 23:38:13
Upvotes: 3