Reputation: 73
I have several dataframes in a list, which i want to merge into one big dataframe. The actual list contains several thouands of this dataframes and i am therefore looking for a preferably efficient solution.
The list looks similar to this:
v <- data.frame(answer = c(1,1,1))
rownames(v) <- c("A","B","C")
w <- data.frame(answer = c(1,0,0))
rownames(w) <- c("A","B","D")
x <- data.frame(answer = c(1,1,1))
rownames(x) <- c("A","B","C")
y <- data.frame(answer = c(0,0,0))
rownames(y) <- c("A","C","D")
z <- data.frame(answer = c(0,0,0,1))
rownames(z) <- c("A","B","C","D")
l <- list(v,w,x,y,z)
names(l) <- c("V","W","X","Y","Z")
The final output should look like this:
v W X Y Z
A 1 1 1 0 0
B 1 0 1 NA 0
C 1 NA 1 0 0
D NA 0 NA 0 1
What i have tried already (feel free to ignore this part, if you already have a working solution)
df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=T),stringsAsFactors=FALSE)
and
df <- do.call(rbind.data.frame, l)
and
df<- rbindlist(l) (from library("data.frame"))
Those all loose the information contained in the rownames and only seemed to work if all dataframes have the same length and the same order.
The only one that kinda worked with my actual data was something along the lines of:
df<- suppressWarnings(Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by =
"answer", all = TRUE),l))
but i am not able to make it work with my example list and even when it worked it was extremly unefficiently and took ages once the list got longer.
Upvotes: 1
Views: 2058
Reputation:
One way of doing this using something similar to what kind of already worked for you is to first declare the rownames as a variable, then rename the columns of your data frames to match their names in the list, and then merge.
df_l <- l %>% Map(setNames, ., names(.)) %>%
map(~mutate(., r=rownames(.))) %>%
Reduce(function(dtf1,dtf2) full_join(dtf1,dtf2,by="r"), .)
rownames(df_l) <- df_l$r
df_l$r <- NULL
To be honest, I'm not sure it is efficient though, and like you said it will probably take long as the list grows.
Upvotes: 2
Reputation: 50678
Here is a base R solution using merge
and Reduce
:
df <- Reduce(
function(x, y) merge(x, y, by = "id", all = T),
lapply(l, function(x) { x$id <- rownames(x); x }))
colnames(df) <- c("id", names(l))
# id V W X Y Z
#1 A 1 1 1 0 0
#2 B 1 0 1 NA 0
#3 C 1 NA 1 0 0
#4 D NA 0 NA 0 1
Upvotes: 5
Reputation: 887118
We create a row names column and then do the join. We loop through the list
with map
, create a row names column with rownames_to_column
and reduce
to a single dataset by doing a full_join
by
the row names and rename
the column names if needed
library(tidyverse)
l %>%
map( ~ .x %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
rename_at(2:6, ~ names(l))
# rn V W X Y Z
#1 A 1 1 1 0 0
#2 B 1 0 1 NA 0
#3 C 1 NA 1 0 0
#4 D NA 0 NA 0 1
Or another option is to bind_rows
and then spread
l %>%
map(rownames_to_column, 'rn') %>%
bind_rows(.id = 'grp') %>%
spread(grp, answer)
# rn V W X Y Z
#1 A 1 1 1 0 0
#2 B 1 0 1 NA 0
#3 C 1 NA 1 0 0
#4 D NA 0 NA 0 1
Upvotes: 3