user1692342
user1692342

Reputation: 5237

Merge and concatenate two data frames in R

I have two data frames A and B which look like:

firstDF:
col1 col2 id
A      1    2
B      5    3
C      6    4

secondDF:
col1 col2 id
A      1    2
E      15   5
F      16   6

Resultant DF:

col1 col2 id
A      1    2
B      5    3
C      6    4
E      15   5
F      16   6

The resultant data frame must contain all the rows from the two data frames. Incase there are rows which have the same id, it must be put in the resultant data frame only once.

I tried using the rbind function, but it returns with all the rows merged. I tried using the merge function with condition x.id=y.id, but the resultant data frame created had multiple columns namely x.col1, y.col1,x.col2, y.col2 and so on.

Upvotes: 2

Views: 6364

Answers (4)

Carson Moore
Carson Moore

Reputation: 1287

This may not be the most performant answer, but a quick and easy way to do it -- assuming that any duplicate rows are in fact exact duplicates (i.e., for any row in df1 where col_1 = X, if there exists a row in df2 where col_1 = X, all other columns are also identical between those two rows) -- would be to rbind them and get the unique results:

> df1
  col_1 col_2 id
1     A     1  2
2     B     5  3
3     C     6  4
> df2
  col_1 col_2 id
1     A     1  2
2     E    15  5
3     F    16  6
> unique(rbind(df1, df2))
  col_1 col_2 id
1     A     1  2
2     B     5  3
3     C     6  4
5     E    15  5
6     F    16  6

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 270170

Using sqldf:

library(sqldf)

sqldf("select * from firstDF union select * from secondDF")

Note that union automatically removes duplicates.

Upvotes: 0

Sari Nahmad
Sari Nahmad

Reputation: 11

You can try the sqldf library. I'm not sure what kind of join. But it would go something like this:

Result =sqldf("select a.col1, a.col2, a.id from firstDF as a join secondDF as b on a.id=b.id")

Or

X=rbind(firstDB, secondDB)

Then filter out duplicates using the unique function.

Upvotes: 1

Alex A.
Alex A.

Reputation: 5586

You can do this with merge().

merge(df1, df2, by=c("col1", "col2", "id"), all.x=T, all.y=T)

This merges by all common variables, keeping all records in either data frame. Alternatively you can omit the by= argument and R will automatically use all common variables.

As @thelatemail mentioned in a comment, rather than individually specifying all.x=T and all.y=T, you can alternatively use all=T.

Upvotes: 5

Related Questions