Reputation: 5237
I have two data frames A and B which look like:
firstDF:
col1 col2 id
A 1 2
B 5 3
C 6 4
secondDF:
col1 col2 id
A 1 2
E 15 5
F 16 6
Resultant DF:
col1 col2 id
A 1 2
B 5 3
C 6 4
E 15 5
F 16 6
The resultant data frame must contain all the rows from the two data frames. Incase there are rows which have the same id, it must be put in the resultant data frame only once.
I tried using the rbind function, but it returns with all the rows merged. I tried using the merge function with condition x.id=y.id, but the resultant data frame created had multiple columns namely x.col1, y.col1,x.col2, y.col2 and so on.
Upvotes: 2
Views: 6364
Reputation: 1287
This may not be the most performant answer, but a quick and easy way to do it -- assuming that any duplicate rows are in fact exact duplicates (i.e., for any row in df1 where col_1 = X, if there exists a row in df2 where col_1 = X, all other columns are also identical between those two rows) -- would be to rbind them and get the unique results:
> df1
col_1 col_2 id
1 A 1 2
2 B 5 3
3 C 6 4
> df2
col_1 col_2 id
1 A 1 2
2 E 15 5
3 F 16 6
> unique(rbind(df1, df2))
col_1 col_2 id
1 A 1 2
2 B 5 3
3 C 6 4
5 E 15 5
6 F 16 6
Upvotes: 0
Reputation: 270170
Using sqldf:
library(sqldf)
sqldf("select * from firstDF union select * from secondDF")
Note that union automatically removes duplicates.
Upvotes: 0
Reputation: 11
You can try the sqldf
library. I'm not sure what kind of join.
But it would go something like this:
Result =sqldf("select a.col1, a.col2, a.id from firstDF as a join secondDF as b on a.id=b.id")
Or
X=rbind(firstDB, secondDB)
Then filter out duplicates using the unique
function.
Upvotes: 1
Reputation: 5586
You can do this with merge()
.
merge(df1, df2, by=c("col1", "col2", "id"), all.x=T, all.y=T)
This merges by all common variables, keeping all records in either data frame. Alternatively you can omit the by=
argument and R will automatically use all common variables.
As @thelatemail mentioned in a comment, rather than individually specifying all.x=T
and all.y=T
, you can alternatively use all=T
.
Upvotes: 5