Lucas Morin
Lucas Morin

Reputation: 398

Is there a base R function to find common elements of two data.frame trough multiple columns?

I have recently come to the problem of finding common elements of two dataframes. My problem is that I need to find common elements based on multiple columns.

Say I have df1 and df2 two data frames that have columns x and y with different values (basically two sets of points in a plane).

My first idea was to rbind both set of points and find duplicates, but that wouldn't work directly as one set of point may have duplicates that are not in the other set.

Basically my second idea was to build unique identifiers :

df1$Id = paste(df1$x,df1$y)
df2$Id = paste(df2$x,df2$y)

Then compare the identifiers :

common_points = df1[df1$Id %in% df2$Id,]

It almost perfectly worked, if not for an unhappy edge case : with my method [11,2] and [1,12] got the same identifier. It was corrected by adding a separator in the paste formula (sep=' ') as an option. I had another idea about inner joining the two data frames.

Is there a base R function that would allow to do that properly without worrying about edge cases ? (would it be better to use another data format for a set of point ?)

Upvotes: 0

Views: 836

Answers (1)

Edo
Edo

Reputation: 7818

I tried to create a reproducible example related to what you explained. So I made up two dataframes with two coordinates.

if you use intersect you will find the rows in common.

# reproducible example
set.seed(19)
df1 <- data.frame(x = sample(1:20, 100, replace = TRUE),
                  y = sample(1:20, 100, replace = TRUE))
df2 <- data.frame(x = sample(1:20, 100, replace = TRUE),
                  y = sample(1:20, 100, replace = TRUE))

# MUST CALL DPLYR!
library(dplyr)

# your solution
intersect(df1, df2)

WARNING: intersect is a base R function. However, the dplyr package adds the possibility to handle dataframes.

Upvotes: 2

Related Questions