Pablowa
Pablowa

Reputation: 73

Merge list of uneven dataframes by rownames

I have several dataframes in a list, which i want to merge into one big dataframe. The actual list contains several thouands of this dataframes and i am therefore looking for a preferably efficient solution.

The list looks similar to this:

v <- data.frame(answer = c(1,1,1))
rownames(v) <- c("A","B","C")
w <- data.frame(answer = c(1,0,0))
rownames(w) <- c("A","B","D")
x <- data.frame(answer = c(1,1,1))
rownames(x) <- c("A","B","C")
y <- data.frame(answer = c(0,0,0))
rownames(y) <- c("A","C","D")
z <- data.frame(answer = c(0,0,0,1))
rownames(z) <- c("A","B","C","D")
l <- list(v,w,x,y,z)
names(l) <- c("V","W","X","Y","Z")

The final output should look like this:

    v    W    X    Y    Z
A   1    1    1    0    0
B   1    0    1    NA   0
C   1    NA   1    0    0
D   NA   0    NA   0    1

What i have tried already (feel free to ignore this part, if you already have a working solution)

  df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=T),stringsAsFactors=FALSE)

and

df <- do.call(rbind.data.frame, l)

and

df<- rbindlist(l) (from library("data.frame"))

Those all loose the information contained in the rownames and only seemed to work if all dataframes have the same length and the same order.

The only one that kinda worked with my actual data was something along the lines of:

df<- suppressWarnings(Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = 
"answer", all = TRUE),l))

but i am not able to make it work with my example list and even when it worked it was extremly unefficiently and took ages once the list got longer.

Upvotes: 1

Views: 2058

Answers (3)

user9607487
user9607487

Reputation:

One way of doing this using something similar to what kind of already worked for you is to first declare the rownames as a variable, then rename the columns of your data frames to match their names in the list, and then merge.

df_l <- l %>% Map(setNames, ., names(.)) %>%
  map(~mutate(., r=rownames(.))) %>% 
  Reduce(function(dtf1,dtf2) full_join(dtf1,dtf2,by="r"), .)

rownames(df_l) <- df_l$r
df_l$r <- NULL

To be honest, I'm not sure it is efficient though, and like you said it will probably take long as the list grows.

Upvotes: 2

Maurits Evers
Maurits Evers

Reputation: 50678

Here is a base R solution using merge and Reduce:

df <- Reduce(
    function(x, y) merge(x, y, by = "id", all = T),
    lapply(l, function(x) { x$id <- rownames(x); x }))
colnames(df) <- c("id", names(l))
#  id  V  W  X  Y Z
#1  A  1  1  1  0 0
#2  B  1  0  1 NA 0
#3  C  1 NA  1  0 0
#4  D NA  0 NA  0 1

Upvotes: 5

akrun
akrun

Reputation: 887118

We create a row names column and then do the join. We loop through the list with map, create a row names column with rownames_to_column and reduce to a single dataset by doing a full_join by the row names and rename the column names if needed

library(tidyverse)
l %>% 
   map( ~ .x %>%
             rownames_to_column('rn')) %>%
   reduce(full_join, by = 'rn') %>%
   rename_at(2:6, ~ names(l))
#  rn  V  W  X  Y Z
#1  A  1  1  1  0 0
#2  B  1  0  1 NA 0
#3  C  1 NA  1  0 0
#4  D NA  0 NA  0 1

Or another option is to bind_rows and then spread

l %>%
  map(rownames_to_column, 'rn') %>% 
  bind_rows(.id = 'grp') %>%
  spread(grp, answer)
#  rn  V  W  X  Y Z
#1  A  1  1  1  0 0
#2  B  1  0  1 NA 0
#3  C  1 NA  1  0 0
#4  D NA  0 NA  0 1

Upvotes: 3

Related Questions