littleworth
littleworth

Reputation: 5169

How to unnest multiple list columns of a dataframe in one go with dplyr pipe

I have the following tibble, which has two nested columns:

library(tidyverse)
df <- structure(list(a = list(c("a", "b"), "c"), b = list(c("1", "2", 
"3"), "3"), c = c(11, 22)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L))

Which produces:

# A tibble: 2 x 3
  a         b             c
  <list>    <list>    <dbl>
1 <chr [2]> <chr [3]>    11
2 <chr [1]> <chr [1]>    22

How can I unnest them at once producing one single tibble?

I tried this but fail:

> df %>% unnest(a, b)
Error: All nested columns must have the same number of elements.

Upvotes: 9

Views: 6971

Answers (2)

Allen Baron
Allen Baron

Reputation: 153

tl;dr

Use unnest_cross() (and be careful if list-cols are missing data --> keep_empty = TRUE):

unnest_cross <- function(data, cols, ...) {
    .df_out <- data
    .cols <- tidyselect::eval_select(rlang::enquo(cols), data)
    purrr::walk(
        .cols,
        function(col) {
            .df_out <<- unnest(.df_out, {{ col }}, ...)
        }
    )
    .df_out
}

Background: Multiple list-columns with unnest()

unnest has handled multiple columns since v0.3.0 (2015). It currently uses the cols argument, which accepts typical tidyverse selection methods.

Note that it's specifically designed to reverse nest()ed data.frames and requires list columns to be "parallel entries ... of compatible sizes". This means:

  1. It doesn't work with the OP's data.frame.
df <- structure(list(
    a = list(c("a", "b"), "c"),
    b = list(c("1", "2", "3"), "3"),
    c = c(11, 22)),
    class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))

tidyr::unnest(df, cols = tidyselect::everything())
#> Error in `fn()`:
#> ! In row 1, can't recycle input of size 2 to size 3.
  1. It will not produce the same output as sequential list-column unnest()ing (e.g. a cartesian product).
# "parallel"/"compatible" data.frame
df_parallel <- structure(list(
    a = list(c("a", "b", "c"), "c"),
    b = list(c("1", "2", "3"), "3"),
    c = c(11, 22)),
    class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))

tidyr::unnest(df_parallel, cols = tidyselect::everything())
#> # A tibble: 4 × 3
#>   a     b         c
#>   <chr> <chr> <dbl>
#> 1 a     1        11
#> 2 b     2        11
#> 3 c     3        11
#> 4 c     3        22

unnest_cross() Details

unnest_cross() uses purrr::walk() to cycle through the specified columns and unnest() them, saving the result each time via superassignment (with <<-). It's name is derived from similarity to purrr::cross() because it always produces a cartesian product of list columns in a data.frame, even when they are "parallel entries" and/or "of compatible sizes"

  1. It works as desired for the original data.frame (with list-columns of unequal length):
# For original data.frame
unnest_cross(df, cols = tidyselect::everything())
#> # A tibble: 7 × 3
#>   a     b         c
#>   <chr> <chr> <dbl>
#> 1 a     1        11
#> 2 a     2        11
#> 3 a     3        11
#> 4 b     1        11
#> 5 b     2        11
#> 6 b     3        11
#> 7 c     3        22
  1. It creates the cartesian product of df_parallel, which is very different from unnest() .
# For df with list-columns of "compatible size"
unnest_cross(df_parallel, cols = tidyselect::everything())
#> # A tibble: 10 × 3
#>    a     b         c
#>    <chr> <chr> <dbl>
#>  1 a     1        11
#>  2 a     2        11
#>  3 a     3        11
#>  4 b     1        11
#>  5 b     2        11
#>  6 b     3        11
#>  7 c     1        11
#>  8 c     2        11
#>  9 c     3        11
#> 10 c     3        22

Created on 2022-06-03 by the reprex package (v2.0.1)

Upvotes: 5

Jared Wilber
Jared Wilber

Reputation: 6795

There's probably a cleaner way to do it, but if you want the cartesian product for the columns you can unnest them in sequence, if nothing else:

> df %>% 
    unnest(a, .drop = FALSE) %>% 
    unnest(b, .drop = FALSE)

# # A tibble: 7 x 3
#       c a     b    
#   <dbl> <chr> <chr>
# 1    11 a     1    
# 2    11 a     2    
# 3    11 a     3    
# 4    11 b     1    
# 5    11 b     2    
# 6    11 b     3    
# 7    22 c     3

Upvotes: 9

Related Questions