Reputation: 2306
I'd like to bind rows of two df's by column names, if column doesn't exist in 2nd df, first should have NA's. Unfortunately dplyr::bind_rows
has a problem with duplicated column names. Could you help me how to solve this problem?
What I did so far?
First some reproducible data:
df1 <- data.frame(replicate(6,sample(0:1,10,rep=TRUE)))
df2 <- data.frame(replicate(3,sample(0:1,10,rep=TRUE)))
colnames(df1) <- c('A','A','A','B','C','E')
colnames(df2) <- c('A','B','C')
df1
A A A B C E
1 0 1 0 1 1 1
2 1 1 1 0 0 0
3 0 0 1 1 0 1
4 0 0 1 0 1 1
5 0 1 0 1 1 0
6 1 0 1 1 1 1
7 0 0 1 1 1 0
8 0 1 0 1 1 0
9 0 1 0 0 1 0
10 1 1 0 1 1 0
df2
A B C
1 1 1 0
2 0 1 1
3 1 1 0
4 1 0 0
5 0 0 1
6 0 0 1
7 0 1 1
8 0 0 0
9 0 0 1
10 0 1 1
What I'd like to get?
A A A B C E
1 0 1 0 1 1 1
2 1 1 1 0 0 0
3 0 0 1 1 0 1
4 0 0 1 0 1 1
5 0 1 0 1 1 0
6 1 0 1 1 1 1
7 0 0 1 1 1 0
8 0 1 0 1 1 0
9 0 1 0 0 1 0
10 1 1 0 1 1 0
11 1 1 1 1 0 NA
12 0 0 0 1 1 NA
13 1 1 1 1 0 NA
14 1 1 1 0 0 NA
15 0 0 0 0 1 NA
16 0 0 0 0 1 NA
17 0 0 0 1 1 NA
18 0 0 0 0 0 NA
19 0 0 0 0 1 NA
20 0 0 0 1 1 NA
I decided to use bind_rows
from dplyr
, but:
result <- bind_rows(mutate_all(df1, as.character), mutate_all(df2, as.character))
Error: Columns `A`, `A` must have unique names
Call `rlang::last_error()` to see a backtrace
Thanks for help in advance!
Upvotes: 4
Views: 1090
Reputation: 388907
Another option could be convert the dataframe to list objects and then cbind
them using cbind.fill
from rowr
package with fill = NA
library(rowr)
new_df <- do.call(cbind.fill, c(mapply(c, as.list(df1),
as.list(df2)[match(names(df1), names(df2))]), fill = NA))
but this loses column names which you can give it back from df1
.
names(new_df) <- names(df1)
new_df
# A A A B C E
#1 0 1 0 0 1 0
#2 0 1 0 0 0 0
#3 0 0 1 1 1 1
#4 0 0 1 0 1 0
#5 1 0 0 0 0 0
#6 0 0 1 0 1 1
#7 0 1 0 0 1 0
#8 0 1 0 0 0 0
#9 0 1 1 0 1 1
#10 1 1 1 0 1 1
#11 0 0 0 1 0 NA
#12 1 1 1 0 0 NA
#13 1 1 1 1 0 NA
#14 0 0 0 1 0 NA
#15 0 0 0 1 1 NA
#16 1 1 1 1 0 NA
#17 0 0 0 1 1 NA
#18 1 1 1 0 0 NA
#19 1 1 1 0 1 NA
#20 1 1 1 0 1 NA
Upvotes: 2
Reputation: 26343
An option using rbindlist
from data.table
library(data.table)
cols <- names(df1)[names(df1) %in% names(df2)]
out <- setDF(rbindlist(list(df1, setNames(df2[cols], cols)), fill = TRUE))
out
# A A A B C E
#1 0 1 0 1 1 0
#2 1 1 1 0 1 0
#3 0 1 1 0 1 0
#4 0 0 1 0 1 0
#5 1 0 0 1 0 1
#6 1 1 1 1 0 1
#7 0 0 0 1 0 0
#8 0 0 0 1 0 0
#9 1 1 0 1 0 0
#10 0 1 0 1 1 0
#11 0 0 0 1 1 NA
#12 1 1 1 1 0 NA
#13 0 0 0 0 1 NA
#14 0 0 0 0 1 NA
#15 0 0 0 0 1 NA
#16 1 1 1 1 1 NA
#17 0 0 0 0 0 NA
#18 1 1 1 1 1 NA
#19 0 0 0 1 0 NA
#20 0 0 0 1 1 NA
The part df2[cols]
would result in the following dataframe with unique names
# A A.1 A.2 B C
#1 0 0 0 1 1
#2 1 1 1 1 0
#3 0 0 0 0 1
# ...
That's why we call setNames(df2[cols], cols)
to immediately rename the columns. setDF
ensures that the result is a data.frame
and not a data.table
.
Using bind_rows
you'd end up with only four columns
head(bind_rows(df1, setNames(df2[cols], cols)), 3)
# A B C E
#1 1 1 0 0
#2 0 0 0 0
#3 0 0 1 1
Upvotes: 3