Reputation: 24790
I often find questions where people have somehow ended up with an unnamed list of unnamed character vectors and they want to bind them row-wise into a data.frame
. Here is an example:
library(magrittr)
data <- cbind(LETTERS[1:3],1:3,4:6,7:9,c(12,15,18)) %>%
split(1:3) %>% unname
data
#[[1]]
#[1] "A" "1" "4" "7" "12"
#
#[[2]]
#[1] "B" "2" "5" "8" "15"
#
#[[3]]
#[1] "C" "3" "6" "9" "18"
One typical approach is with do.call
from base R.
do.call(rbind, data) %>% as.data.frame
# V1 V2 V3 V4 V5
#1 A 1 4 7 12
#2 B 2 5 8 15
#3 C 3 6 9 18
Perhaps a less efficient approach is with Reduce
from base R.
Reduce(rbind,data, init = NULL) %>% as.data.frame
# V1 V2 V3 V4 V5
#1 A 1 4 7 12
#2 B 2 5 8 15
#3 C 3 6 9 18
However, when we consider more modern packages such as dplyr
or data.table
, some of the approaches that might immediately come to mind don't work because the vectors are unnamed or aren't a list.
library(dplyr)
bind_rows(data)
#Error: Argument 1 must have names
library(data.table)
rbindlist(data)
#Error in rbindlist(data) :
# Item 1 of input is not a data.frame, data.table or list
One approach might be to set_names
on the vectors.
library(purrr)
map_df(data, ~set_names(.x, seq_along(.x)))
# A tibble: 3 x 5
# `1` `2` `3` `4` `5`
# <chr> <chr> <chr> <chr> <chr>
#1 A 1 4 7 12
#2 B 2 5 8 15
#3 C 3 6 9 18
However, this seems like more steps than it needs to be.
Therefore, my question is what is an efficient tidyverse
or data.table
approach to binding an unnamed list of unnamed character vectors into a data.frame
row-wise?
Upvotes: 33
Views: 2278
Reputation: 21908
I think this could be added to an already complete set of very good answers to this question:
library(rlang) # Or purrr
data %>%
exec(rbind, !!!.) %>%
as_tibble() %>%
set_names(~ letters[seq_along(.)])
# A tibble: 3 x 5
a b c d e
<chr> <chr> <chr> <chr> <chr>
1 A 1 4 7 12
2 B 2 5 8 15
3 C 3 6 9 18
Upvotes: 1
Reputation: 39858
Not entirely sure about efficiency, but a compact option using purrr
and tibble
could be:
map_dfc(purrr::transpose(data), ~ unlist(tibble(.)))
V1 V2 V3 V4 V5
<chr> <chr> <chr> <chr> <chr>
1 A 1 4 7 12
2 B 2 5 8 15
3 C 3 6 9 18
Upvotes: 15
Reputation: 26343
Edit
Use @sindri_baldur's approach: https://stackoverflow.com/a/61660119/8583393
A way with data.table
, similar to what @tmfmnk showed
library(data.table)
as.data.table(transpose(data))
# V1 V2 V3 V4 V5
#1: A 1 4 7 12
#2: B 2 5 8 15
#3: C 3 6 9 18
Upvotes: 11
Reputation:
This seems rather compact. I believe this is what powers bind_rows()
from dplyr
and therefore map_df()
in purrr
, so should be fairly efficient.
library(vctrs)
vec_rbind(!!!data)
This gives a data.frame.
...1 ...2 ...3 ...4 ...5
1 A 1 4 7 12
2 B 2 5 8 15
3 C 3 6 9 18
It seems like the .name_repair
within the tidyverse
methods is a severe bottleneck. I took a few fairly straightforward options that also seemed to run the quickest from the other posts (thanks H 1 and sindri_baldur).
microbenchmark(vctrs = vec_rbind(!!!data),
dt = rbindlist(lapply(data, as.list)),
map = map_df(data, as_tibble_row, .name_repair = "unique"),
base = as.data.frame(do.call(rbind, data)))
But if you first name the vectors (but not necessarily the list elements), you get a different story.
data2 <- modify(data, ~set_names(.x, seq(.x)))
microbenchmark(vctrs = vec_rbind(!!!data2),
dt = rbindlist(lapply(data2, as.list)),
map = map_df(data2, as_tibble_row),
base = as.data.frame(do.call(rbind, data2)))
In fact, you can include the time to name the vectors into the vec_rbind()
solution and not the others, and still see fairly high performance.
microbenchmark(vctrs = vec_rbind(!!!modify(data, ~set_names(.x, seq(.x)))),
dt = setDF(transpose(data)),
map = map_df(data2, as_tibble_row),
base = as.data.frame(do.call(rbind, data)))
For what its worth.
Upvotes: 9
Reputation: 33488
library(data.table)
setDF(transpose(data))
V1 V2 V3 V4 V5
1 A 1 4 7 12
2 B 2 5 8 15
3 C 3 6 9 18
Upvotes: 10
Reputation: 34441
Here is a slight variation on tmfmnk's suggested approach using as_tibble_row()
to convert the vectors into single row tibbles. It's also necessary to use the .name_repair
argument:
library(purrr)
library(tibble)
map_df(data, as_tibble_row, .name_repair = ~paste0("value", seq(.x)))
# A tibble: 3 x 5
value1 value2 value3 value4 value5
<chr> <chr> <chr> <chr> <chr>
1 A 1 4 7 12
2 B 2 5 8 15
3 C 3 6 9 18
Upvotes: 3
Reputation: 16697
My approach would be to just turn those list entries into expected type
rbindlist(lapply(data, as.list))
# V1 V2 V3 V4 V5
# <char> <char> <char> <char> <char>
#1: A 1 4 7 12
#2: B 2 5 8 15
#3: C 3 6 9 18
If you want your data types to be adjusted from character vector to appropriate types, then lapply
can help here as well. First lapply
is called for every row, second lapply
is called for every column.
rbindlist(lapply(data, as.list))[, lapply(.SD, type.convert)]
V1 V2 V3 V4 V5
<fctr> <int> <int> <int> <int>
1: A 1 4 7 12
2: B 2 5 8 15
3: C 3 6 9 18
Upvotes: 6
Reputation: 887118
An option with unnest_wider
library(tibble)
library(tidyr)
library(stringr)
tibble(col = data) %>%
unnest_wider(c(col), names_repair = ~ str_c('value', seq_along(.)))
# A tibble: 3 x 5
# value1 value2 value3 value4 value5
# <chr> <chr> <chr> <chr> <chr>
#1 A 1 4 7 12
#2 B 2 5 8 15
#3 C 3 6 9 18
Upvotes: 5