Reputation: 1845
I want to create a function that merges a list of dataframes with different column numbers and the rows have different names that I'd like to keep. Essentially I want to stack dataframes where the column names just become another row to be appended.
df <- list()
df[[1]] <- data.frame(d = c(4,5), e = c("c", "d"))
rownames(df[[1]]) <- c("df2_row_1", "df2_row_2")
df[[2]] <- data.frame(a = c(1,2,3), b = c("a", "b", "c"), c = c("one", "two", "three"))
rownames(df[[2]]) <- c("df1_row_1", "df1_row_2", "df1_row_3")
df[[3]] <- data.frame(f = c(6,7,8), g = c("e", "f", "g"), h = c("one", "two", "three"), w = c(100,101,102))
rownames(df[[3]]) <- c("df3_row_1", "df3_row_2", "df3_row_3")
do.call(bind_rows, df)
d e a b c f g h w
1 4 c NA <NA> <NA> NA <NA> <NA> NA
2 5 d NA <NA> <NA> NA <NA> <NA> NA
3 NA <NA> 1 a one NA <NA> <NA> NA
4 NA <NA> 2 b two NA <NA> <NA> NA
5 NA <NA> 3 c three NA <NA> <NA> NA
6 NA <NA> NA <NA> <NA> 6 e one 100
7 NA <NA> NA <NA> <NA> 7 f two 101
8 NA <NA> NA <NA> <NA> 8 g three 102
d e
df2_row_1 4 c
df2_row_2 5 d
a b c
df1_row_1 1 a one
df1_row_2 2 b two
df1_row_3 3 c three
f g h w
df3_row_1 6 e one 100
df3_row_2 7 f two 101
df3_row_3 8 g three 102
I've tried (unsuccessfully) creating a function that finds the longest data frame, then appends empty columns to the data frames that are shorter than the longest, then gives all the data frames the same name for each of those columns.
I also realize this couldn't be more NOT tidy - is this possible?
Thank you!!!
Upvotes: 0
Views: 465
Reputation: 6496
This can be achieved with a for loop (I think it could be achieved with mapply
to, check ?mapply
). The overall strategy is filling each df in the list with NAs (cbind
ing them) and then rbindlist
ing the resulting list:
library(data.table)
cols <- max(sapply(df, ncol))
# This is the length of the NA vectors that make the cbinding dfs:
lengths <- (cols - sapply(df, ncol))*sapply(df, nrow)
newdf <- list()
for (i in 1:length(df)){
if (ncol(df[[i]]) != cols){
newdf[[i]] <- cbind(df[[i]],
as.data.frame(matrix(rep(NA, lengths[i]),
ncol = lengths[i] / nrow(df[[i]]))))
} else {
newdf[[i]] <- df[[i]]
}
}
rbindlist(newdf, use.names = FALSE)
Which results in:
d e V1 V2
1: 4 c <NA> NA
2: 5 d <NA> NA
3: 1 a one NA
4: 2 b two NA
5: 3 c three NA
6: 6 e one 100
7: 7 f two 101
8: 8 g three 102
Upvotes: 1