MayaGans
MayaGans

Reputation: 1845

function to rbind list of dataframes different columns and rows

I want to create a function that merges a list of dataframes with different column numbers and the rows have different names that I'd like to keep. Essentially I want to stack dataframes where the column names just become another row to be appended.

df <- list()

df[[1]] <-  data.frame(d = c(4,5), e = c("c", "d"))
rownames(df[[1]]) <- c("df2_row_1", "df2_row_2")

df[[2]] <- data.frame(a = c(1,2,3), b = c("a", "b", "c"), c = c("one", "two", "three"))
rownames(df[[2]]) <- c("df1_row_1", "df1_row_2", "df1_row_3")


df[[3]] <- data.frame(f = c(6,7,8), g = c("e", "f", "g"), h = c("one", "two", "three"), w = c(100,101,102))
rownames(df[[3]]) <- c("df3_row_1", "df3_row_2", "df3_row_3")

Current Output:

do.call(bind_rows, df)

   d    e  a    b     c  f    g     h   w
1  4    c NA <NA>  <NA> NA <NA>  <NA>  NA
2  5    d NA <NA>  <NA> NA <NA>  <NA>  NA
3 NA <NA>  1    a   one NA <NA>  <NA>  NA
4 NA <NA>  2    b   two NA <NA>  <NA>  NA
5 NA <NA>  3    c three NA <NA>  <NA>  NA
6 NA <NA> NA <NA>  <NA>  6    e   one 100
7 NA <NA> NA <NA>  <NA>  7    f   two 101
8 NA <NA> NA <NA>  <NA>  8    g three 102

Desired Output

          d e  
df2_row_1 4 c 
df2_row_2 5 d
          a b     c 
df1_row_1 1 a   one 
df1_row_2 2 b   two 
df1_row_3 3 c three 
          f g     h   w
df3_row_1 6 e   one 100
df3_row_2 7 f   two 101
df3_row_3 8 g three 102

I've tried (unsuccessfully) creating a function that finds the longest data frame, then appends empty columns to the data frames that are shorter than the longest, then gives all the data frames the same name for each of those columns.

I also realize this couldn't be more NOT tidy - is this possible?

Thank you!!!

Upvotes: 0

Views: 465

Answers (1)

PavoDive
PavoDive

Reputation: 6496

This can be achieved with a for loop (I think it could be achieved with mapply to, check ?mapply). The overall strategy is filling each df in the list with NAs (cbinding them) and then rbindlisting the resulting list:

library(data.table)

cols <- max(sapply(df, ncol))

# This is the length of the NA vectors that make the cbinding dfs:
lengths <- (cols - sapply(df, ncol))*sapply(df, nrow)

newdf <- list()

for (i in 1:length(df)){
  if (ncol(df[[i]]) != cols){
    newdf[[i]] <- cbind(df[[i]], 
                        as.data.frame(matrix(rep(NA, lengths[i]), 
                                             ncol = lengths[i] / nrow(df[[i]]))))
  } else {
    newdf[[i]] <- df[[i]]
  }
}

rbindlist(newdf, use.names = FALSE)

Which results in:

   d e    V1  V2
1: 4 c  <NA>  NA
2: 5 d  <NA>  NA
3: 1 a   one  NA
4: 2 b   two  NA
5: 3 c three  NA
6: 6 e   one 100
7: 7 f   two 101
8: 8 g three 102

Upvotes: 1

Related Questions