Reputation: 5169
I have the following tibble, which has two nested columns:
library(tidyverse)
df <- structure(list(a = list(c("a", "b"), "c"), b = list(c("1", "2",
"3"), "3"), c = c(11, 22)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L))
Which produces:
# A tibble: 2 x 3
a b c
<list> <list> <dbl>
1 <chr [2]> <chr [3]> 11
2 <chr [1]> <chr [1]> 22
How can I unnest them at once producing one single tibble?
I tried this but fail:
> df %>% unnest(a, b)
Error: All nested columns must have the same number of elements.
Upvotes: 9
Views: 6971
Reputation: 153
Use unnest_cross()
(and be careful if list-cols are missing data --> keep_empty = TRUE
):
unnest_cross <- function(data, cols, ...) {
.df_out <- data
.cols <- tidyselect::eval_select(rlang::enquo(cols), data)
purrr::walk(
.cols,
function(col) {
.df_out <<- unnest(.df_out, {{ col }}, ...)
}
)
.df_out
}
unnest()
unnest
has handled multiple columns since v0.3.0
(2015). It currently uses the cols
argument, which accepts typical tidyverse selection methods.
Note that it's specifically designed to reverse nest()
ed data.frames and requires list columns to be "parallel entries ... of compatible sizes". This means:
df <- structure(list(
a = list(c("a", "b"), "c"),
b = list(c("1", "2", "3"), "3"),
c = c(11, 22)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))
tidyr::unnest(df, cols = tidyselect::everything())
#> Error in `fn()`:
#> ! In row 1, can't recycle input of size 2 to size 3.
unnest()
ing (e.g. a cartesian product).# "parallel"/"compatible" data.frame
df_parallel <- structure(list(
a = list(c("a", "b", "c"), "c"),
b = list(c("1", "2", "3"), "3"),
c = c(11, 22)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))
tidyr::unnest(df_parallel, cols = tidyselect::everything())
#> # A tibble: 4 × 3
#> a b c
#> <chr> <chr> <dbl>
#> 1 a 1 11
#> 2 b 2 11
#> 3 c 3 11
#> 4 c 3 22
unnest_cross()
Detailsunnest_cross()
uses purrr::walk()
to cycle through the specified columns and unnest()
them, saving the result each time via superassignment (with <<-
). It's name is derived from similarity to purrr::cross()
because it always produces a cartesian product of list columns in a data.frame, even when they are "parallel entries" and/or "of compatible sizes"
# For original data.frame
unnest_cross(df, cols = tidyselect::everything())
#> # A tibble: 7 × 3
#> a b c
#> <chr> <chr> <dbl>
#> 1 a 1 11
#> 2 a 2 11
#> 3 a 3 11
#> 4 b 1 11
#> 5 b 2 11
#> 6 b 3 11
#> 7 c 3 22
df_parallel
, which is very different from unnest()
.# For df with list-columns of "compatible size"
unnest_cross(df_parallel, cols = tidyselect::everything())
#> # A tibble: 10 × 3
#> a b c
#> <chr> <chr> <dbl>
#> 1 a 1 11
#> 2 a 2 11
#> 3 a 3 11
#> 4 b 1 11
#> 5 b 2 11
#> 6 b 3 11
#> 7 c 1 11
#> 8 c 2 11
#> 9 c 3 11
#> 10 c 3 22
Created on 2022-06-03 by the reprex package (v2.0.1)
Upvotes: 5
Reputation: 6795
There's probably a cleaner way to do it, but if you want the cartesian product for the columns you can unnest them in sequence, if nothing else:
> df %>%
unnest(a, .drop = FALSE) %>%
unnest(b, .drop = FALSE)
# # A tibble: 7 x 3
# c a b
# <dbl> <chr> <chr>
# 1 11 a 1
# 2 11 a 2
# 3 11 a 3
# 4 11 b 1
# 5 11 b 2
# 6 11 b 3
# 7 22 c 3
Upvotes: 9