Binding rows of two data frames (merging by column names) with duplicated column names

Question

I'd like to bind rows of two df's by column names, if column doesn't exist in 2nd df, first should have NA's. Unfortunately dplyr::bind_rows has a problem with duplicated column names. Could you help me how to solve this problem?

What I did so far?

First some reproducible data:

df1 <- data.frame(replicate(6,sample(0:1,10,rep=TRUE)))
df2 <- data.frame(replicate(3,sample(0:1,10,rep=TRUE)))
colnames(df1) <- c('A','A','A','B','C','E')
colnames(df2) <- c('A','B','C')

df1
   A A A B C E
1  0 1 0 1 1 1
2  1 1 1 0 0 0
3  0 0 1 1 0 1
4  0 0 1 0 1 1
5  0 1 0 1 1 0
6  1 0 1 1 1 1
7  0 0 1 1 1 0
8  0 1 0 1 1 0
9  0 1 0 0 1 0
10 1 1 0 1 1 0

df2
   A B C
1  1 1 0
2  0 1 1
3  1 1 0
4  1 0 0
5  0 0 1
6  0 0 1
7  0 1 1
8  0 0 0
9  0 0 1
10 0 1 1

What I'd like to get?

   A A A B C E
1  0 1 0 1 1 1
2  1 1 1 0 0 0
3  0 0 1 1 0 1
4  0 0 1 0 1 1
5  0 1 0 1 1 0
6  1 0 1 1 1 1
7  0 0 1 1 1 0
8  0 1 0 1 1 0
9  0 1 0 0 1 0
10 1 1 0 1 1 0
11 1 1 1 1 0 NA
12 0 0 0 1 1 NA
13 1 1 1 1 0 NA
14 1 1 1 0 0 NA
15 0 0 0 0 1 NA
16 0 0 0 0 1 NA
17 0 0 0 1 1 NA
18 0 0 0 0 0 NA
19 0 0 0 0 1 NA
20 0 0 0 1 1 NA

I decided to use bind_rows from dplyr, but:

result <- bind_rows(mutate_all(df1, as.character), mutate_all(df2, as.character))
Error: Columns `A`, `A` must have unique names
Call `rlang::last_error()` to see a backtrace

Thanks for help in advance!

markus · Accepted Answer

An option using rbindlist from data.table

library(data.table)
cols <- names(df1)[names(df1) %in% names(df2)]
out <- setDF(rbindlist(list(df1, setNames(df2[cols], cols)), fill = TRUE))
out
#   A A A B C  E
#1  0 1 0 1 1  0
#2  1 1 1 0 1  0
#3  0 1 1 0 1  0
#4  0 0 1 0 1  0
#5  1 0 0 1 0  1
#6  1 1 1 1 0  1
#7  0 0 0 1 0  0
#8  0 0 0 1 0  0
#9  1 1 0 1 0  0
#10 0 1 0 1 1  0
#11 0 0 0 1 1 NA
#12 1 1 1 1 0 NA
#13 0 0 0 0 1 NA
#14 0 0 0 0 1 NA
#15 0 0 0 0 1 NA
#16 1 1 1 1 1 NA
#17 0 0 0 0 0 NA
#18 1 1 1 1 1 NA
#19 0 0 0 1 0 NA
#20 0 0 0 1 1 NA

The part df2[cols] would result in the following dataframe with unique names

#   A A.1 A.2 B C
#1  0   0   0 1 1
#2  1   1   1 1 0
#3  0   0   0 0 1
# ...

That's why we call setNames(df2[cols], cols) to immediately rename the columns. setDF ensures that the result is a data.frame and not a data.table.

Using bind_rows you'd end up with only four columns

head(bind_rows(df1, setNames(df2[cols], cols)), 3)
#  A B C E
#1 1 1 0 0
#2 0 0 0 0
#3 0 0 1 1

Binding rows of two data frames (merging by column names) with duplicated column names

Answers (2)

Related Questions