Ian Campbell
Ian Campbell

Reputation: 24790

Tidyverse approach to binding unnamed list of unnamed vectors by row - do.call(rbind,x) equivalent

I often find questions where people have somehow ended up with an unnamed list of unnamed character vectors and they want to bind them row-wise into a data.frame. Here is an example:

library(magrittr)
data <- cbind(LETTERS[1:3],1:3,4:6,7:9,c(12,15,18)) %>%
  split(1:3) %>% unname
data
#[[1]]
#[1] "A"  "1"  "4"  "7"  "12"
#
#[[2]]
#[1] "B"  "2"  "5"  "8"  "15"
#
#[[3]]
#[1] "C"  "3"  "6"  "9"  "18"

One typical approach is with do.call from base R.

do.call(rbind, data) %>% as.data.frame
#  V1 V2 V3 V4 V5
#1  A  1  4  7 12
#2  B  2  5  8 15
#3  C  3  6  9 18

Perhaps a less efficient approach is with Reduce from base R.

Reduce(rbind,data, init = NULL) %>% as.data.frame
#  V1 V2 V3 V4 V5
#1  A  1  4  7 12
#2  B  2  5  8 15
#3  C  3  6  9 18

However, when we consider more modern packages such as dplyr or data.table, some of the approaches that might immediately come to mind don't work because the vectors are unnamed or aren't a list.

library(dplyr)
bind_rows(data)
#Error: Argument 1 must have names
library(data.table)
rbindlist(data)
#Error in rbindlist(data) : 
#  Item 1 of input is not a data.frame, data.table or list

One approach might be to set_names on the vectors.

library(purrr)
map_df(data, ~set_names(.x, seq_along(.x)))
# A tibble: 3 x 5
#  `1`   `2`   `3`   `4`   `5`  
#  <chr> <chr> <chr> <chr> <chr>
#1 A     1     4     7     12   
#2 B     2     5     8     15   
#3 C     3     6     9     18  

However, this seems like more steps than it needs to be.

Therefore, my question is what is an efficient tidyverse or data.table approach to binding an unnamed list of unnamed character vectors into a data.frame row-wise?

Upvotes: 33

Views: 2278

Answers (8)

Anoushiravan R
Anoushiravan R

Reputation: 21908

I think this could be added to an already complete set of very good answers to this question:

library(rlang) # Or purrr

data %>%
  exec(rbind, !!!.) %>%
  as_tibble() %>%
  set_names(~ letters[seq_along(.)])

# A tibble: 3 x 5
  a     b     c     d     e    
  <chr> <chr> <chr> <chr> <chr>
1 A     1     4     7     12   
2 B     2     5     8     15   
3 C     3     6     9     18  

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 39858

Not entirely sure about efficiency, but a compact option using purrr and tibble could be:

map_dfc(purrr::transpose(data), ~ unlist(tibble(.)))

  V1    V2    V3    V4    V5   
  <chr> <chr> <chr> <chr> <chr>
1 A     1     4     7     12   
2 B     2     5     8     15   
3 C     3     6     9     18  

Upvotes: 15

markus
markus

Reputation: 26343

Edit

Use @sindri_baldur's approach: https://stackoverflow.com/a/61660119/8583393


A way with data.table, similar to what @tmfmnk showed

library(data.table)
as.data.table(transpose(data))
#   V1 V2 V3 V4 V5
#1:  A  1  4  7 12
#2:  B  2  5  8 15
#3:  C  3  6  9 18

Upvotes: 11

user10917479
user10917479

Reputation:

This seems rather compact. I believe this is what powers bind_rows() from dplyr and therefore map_df() in purrr, so should be fairly efficient.

library(vctrs)

vec_rbind(!!!data)

This gives a data.frame.

  ...1 ...2 ...3 ...4 ...5
1    A    1    4    7   12
2    B    2    5    8   15
3    C    3    6    9   18

Some Benchmarks

It seems like the .name_repair within the tidyverse methods is a severe bottleneck. I took a few fairly straightforward options that also seemed to run the quickest from the other posts (thanks H 1 and sindri_baldur).

microbenchmark(vctrs = vec_rbind(!!!data),
               dt = rbindlist(lapply(data, as.list)),
               map = map_df(data, as_tibble_row, .name_repair = "unique"),
               base = as.data.frame(do.call(rbind, data)))

benchmark 1

But if you first name the vectors (but not necessarily the list elements), you get a different story.

data2 <- modify(data, ~set_names(.x, seq(.x)))

microbenchmark(vctrs = vec_rbind(!!!data2),
               dt = rbindlist(lapply(data2, as.list)),
               map = map_df(data2, as_tibble_row),
               base = as.data.frame(do.call(rbind, data2)))

benchmark 2

In fact, you can include the time to name the vectors into the vec_rbind() solution and not the others, and still see fairly high performance.

microbenchmark(vctrs = vec_rbind(!!!modify(data, ~set_names(.x, seq(.x)))),
               dt = setDF(transpose(data)),
               map = map_df(data2, as_tibble_row),
               base = as.data.frame(do.call(rbind, data)))

final benchmark

For what its worth.

Upvotes: 9

s_baldur
s_baldur

Reputation: 33488

library(data.table)
setDF(transpose(data))

  V1 V2 V3 V4 V5
1  A  1  4  7 12
2  B  2  5  8 15
3  C  3  6  9 18

Upvotes: 10

lroha
lroha

Reputation: 34441

Here is a slight variation on tmfmnk's suggested approach using as_tibble_row() to convert the vectors into single row tibbles. It's also necessary to use the .name_repair argument:

library(purrr)
library(tibble)

map_df(data, as_tibble_row, .name_repair = ~paste0("value", seq(.x)))

# A tibble: 3 x 5
  value1 value2 value3 value4 value5
  <chr>  <chr>  <chr>  <chr>  <chr> 
1 A      1      4      7      12    
2 B      2      5      8      15    
3 C      3      6      9      18

Upvotes: 3

jangorecki
jangorecki

Reputation: 16697

My approach would be to just turn those list entries into expected type

rbindlist(lapply(data, as.list))
#       V1     V2     V3     V4     V5
#   <char> <char> <char> <char> <char>
#1:      A      1      4      7     12
#2:      B      2      5      8     15
#3:      C      3      6      9     18

If you want your data types to be adjusted from character vector to appropriate types, then lapply can help here as well. First lapply is called for every row, second lapply is called for every column.

rbindlist(lapply(data, as.list))[, lapply(.SD, type.convert)]
       V1    V2    V3    V4    V5
   <fctr> <int> <int> <int> <int>
1:      A     1     4     7    12
2:      B     2     5     8    15
3:      C     3     6     9    18

Upvotes: 6

akrun
akrun

Reputation: 887118

An option with unnest_wider

library(tibble)
library(tidyr)
library(stringr)
tibble(col = data) %>%
    unnest_wider(c(col), names_repair = ~ str_c('value', seq_along(.)))
# A tibble: 3 x 5
#  value1 value2 value3 value4 value5
#  <chr>  <chr>  <chr>  <chr>  <chr> 
#1 A      1      4      7      12    
#2 B      2      5      8      15    
#3 C      3      6      9      18    

Upvotes: 5

Related Questions