Reputation: 83
I'm trying to combine data frames (hundreds of them), but they have different numbers of rows.
df1 <- data.frame(c(7,5,3,4,5), c(43,56,23,78,89))
df2 <- data.frame(c(7,5,3,4,5,8,5), c(43,56,23,78,89,45,78))
df3 <- data.frame(c(7,5,3,4,5,8,5,6,7), c(43,56,23,78,89,45,78,56,67))
colnames(df1) <- c("xVar1","xVar2")
colnames(df2) <- c("yVar1","yVar2")
colnames(df3) <- c("zVar1","zVar2")
a1 <- list(df1,df2,df3)
a1 is what is my initial data actually looks like when I get it.
Now if I do:
b1 <- as.data.frame(a1)
I get an error, because the # of rows is not the same in the data (this would work fine if the # of rows was the same).
How do I make the # of rows equal or work around this issue?
I would like to be able to merge the data in this way (here is a working example with the same # of rows):
df1b <- data.frame(c(7,5,3,4,5), c(43,56,23,78,89))
df2b <- data.frame(c(7,5,3,4,6), c(43,56,24,48,89))
df3b <- data.frame(c(7,5,3,4,5), c(43,56,23,78,89))
colnames(df1b) <- c("xVar1","xVar2")
colnames(df2b) <- c("yVar1","yVar2")
colnames(df3b) <- c("zVar1","zVar2")
a2 <- list(df1b,df2b,df3b)
b2 <- as.data.frame(a2)
Thanks!
Upvotes: 0
Views: 108
Reputation: 389
One can add a key (row count as variable value in this case) to each dataframe then merge by the key.
# get list of dfs (should prob import data into a list of dfs instead)
list_df<-mget(ls(pattern = "df[0-9]"))
#add newcolumn -- "key"
list_df<-lapply(list_df, function(df, newcol) {
df[[newcol]]<-seq(nrow(df))
return(df)
}, "key")
#merge function
MergeAllf <- function(x, y){
df <- merge(x, y, by= "key", all.x= T, all.y= T)
}
#pass list to merge funct
library(tidyverse)
data <- Reduce(MergeAllf, list_df)%>%
select(key, everything())#reorder or can drop "key"
data
key xVar1 xVar2 yVar1 yVar2 zVar1 zVar2
1 1 7 43 7 43 7 43
2 2 5 56 5 56 5 56
3 3 3 23 3 23 3 23
4 4 4 78 4 78 4 78
5 5 5 89 5 89 5 89
6 6 NA NA 8 45 8 45
7 7 NA NA 5 78 5 78
8 8 NA NA NA NA 6 56
9 9 NA NA NA NA 7 67
Upvotes: 1
Reputation: 1999
Solution 1
You can achieve this with rbindlist()
. Note that the column names will be the column names of the first data frame in the list:
library(data.table)
b1 = data.frame(rbindlist(a1))
> b1
xVar1 xVar2
1 7 43
2 5 56
3 3 23
4 4 78
5 5 89
6 7 43
7 5 56
8 3 23
9 4 78
10 5 89
11 8 45
12 5 78
13 7 43
14 5 56
15 3 23
16 4 78
17 5 89
18 8 45
19 5 78
20 6 56
21 7 67
Solution 2
Alternatively, you make all the columns have the same name, then bind by row:
b1 = lapply(a1, setNames, c("Var1","Var2"))
Now you can bind by rows:
b1 = do.call(dplyr::bind_rows, b1)
> b1
Var1 Var2
1 7 43
2 5 56
3 3 23
4 4 78
5 5 89
6 7 43
7 5 56
8 3 23
9 4 78
10 5 89
11 8 45
12 5 78
13 7 43
14 5 56
15 3 23
16 4 78
17 5 89
18 8 45
19 5 78
20 6 56
21 7 67
Upvotes: 0
Reputation: 2141
cbind.fill
from rowr
provides functionality for this and fills missing elements with NA
:
library(purrr)
library(rowr)
b1 <- purrr::reduce(a1,cbind.fill,fill=NA)
Upvotes: 1