Reputation: 398
I have recently come to the problem of finding common elements of two dataframes. My problem is that I need to find common elements based on multiple columns.
Say I have df1 and df2 two data frames that have columns x and y with different values (basically two sets of points in a plane).
My first idea was to rbind both set of points and find duplicates, but that wouldn't work directly as one set of point may have duplicates that are not in the other set.
Basically my second idea was to build unique identifiers :
df1$Id = paste(df1$x,df1$y)
df2$Id = paste(df2$x,df2$y)
Then compare the identifiers :
common_points = df1[df1$Id %in% df2$Id,]
It almost perfectly worked, if not for an unhappy edge case : with my method [11,2] and [1,12] got the same identifier. It was corrected by adding a separator in the paste formula (sep=' ') as an option. I had another idea about inner joining the two data frames.
Is there a base R function that would allow to do that properly without worrying about edge cases ? (would it be better to use another data format for a set of point ?)
Upvotes: 0
Views: 836
Reputation: 7818
I tried to create a reproducible example related to what you explained. So I made up two dataframes with two coordinates.
if you use intersect
you will find the rows in common.
# reproducible example
set.seed(19)
df1 <- data.frame(x = sample(1:20, 100, replace = TRUE),
y = sample(1:20, 100, replace = TRUE))
df2 <- data.frame(x = sample(1:20, 100, replace = TRUE),
y = sample(1:20, 100, replace = TRUE))
# MUST CALL DPLYR!
library(dplyr)
# your solution
intersect(df1, df2)
WARNING: intersect
is a base R function. However, the dplyr
package adds the possibility to handle dataframes.
Upvotes: 2