Reputation: 61
Let's say I have the following dfs
df1:
a b c d
1 2 3 4
4 3 3 4
9 7 3 4
df2:
a b c d
1 2 3 4
2 2 3 4
3 2 3 4
Now I want to merge both dfs conditional of column "a" to give me the following df
a b c d
1 2 3 4
4 3 3 4
9 7 3 4
2 2 3 4
3 2 3 4
In my dataset i tried using
merge <- merge(x = df1, y = df2, by = "a", all = TRUE)
However, while df1 has 50,000 entries and df2 has 100,000 entries and there are definately matching values in column a the merged df has over one million entries. I do not understand this. As I understand there should be max. 150,000 entries in the merged df and this is the case when no values in column a are equal between the two dfs.
Upvotes: 2
Views: 1388
Reputation: 102609
here is another base R solution using rbind
+ %in%
dfout <- rbind(df1,subset(df2,!a %in% df1$a))
such that
> rbind(df1,subset(df2,!a %in% df1$a))
a b c d
1 1 2 3 4
2 4 3 3 4
3 9 7 3 4
21 2 2 3 4
31 3 2 3 4
Upvotes: 0
Reputation: 887811
With tidyverse
, we can do bind_rows
and distinct
library(dplyr)
bind_rows(df1, df2) %>%
distinct
df1 <- structure(list(a = c(1, 4, 9), b = c(2, 3, 7), c = c(3, 3, 3),
d = c(4, 4, 4)), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(a = c(1, 2, 3), b = c(2, 2, 2), c = c(3, 3, 3),
d = c(4, 4, 4)), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 1
Reputation: 21442
I think what you want to do is not merge
but rather rbind
the two dataframes and remove the duplicated
rows:
DATA:
df1 <- data.frame(a = c(1,4,9),
b = c(2,3,7),
c = c(3,3,3),
d = c(4,4,4))
df2 <- data.frame(a = c(1,2,3),
b = c(2,2,2),
c = c(3,3,3),
d = c(4,4,4))
SOLUTION:
Row-bind df1
and df2
:
df3 <- rbind(df1, df2)
Remove the duplicate rows:
df3 <- df3[!duplicated(df3), ]
RESULT:
df3
a b c d
1 1 2 3 4
2 4 3 3 4
3 9 7 3 4
5 2 2 3 4
6 3 2 3 4
Upvotes: 3