How to do a conditional join in R with dplyr?

Question

For example, df1 looks like below -

X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   100 
Guava   Germany   Green Sale       200
Grape   Italy     Purple Purchase   500
Orange India   Orange   Sale       2000

df2 looks like below -

 X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   10000 
Guava   Germany   Green Sale       20000
Grape   Italy     Purple Purchase   
Orange India   Orange   Sale       2000

My output should look like -

 X1         X2     X3     X4         X5.x  X5.y
Apple   Belgium   Red   Purchase   100     10000
Guava   Germany   Green Sale       200    20000
Grape   Italy     Purple Purchase   500   NA

Here multiple operations are involved -

Pick the rows present in 1 and not in other, vice versa
Pick the mismatches in X5 column (X5 is my target column) when the first 4 column matches
I do not want the matches.

I tried a combination of inner_join, full_join and anti_join of both to obtain the part1. How do I perform the second part? Is there a conditional join available in R that picks only the mismatches and ignores when the target column is same?

I don't want to use sqldf. I know this can be achieved in SQL. I want to do this in dplyr. Any help is much appreciated.

TIA.

Gregor Thomas · Accepted Answer

left_join(df1, df2, by = c("X1", "X2", "X3", "X4")) %>%
  filter(X5.x != X5.y | is.na(X5.x) | is.na(X5.y))
#      X1      X2     X3       X4 X5.x  X5.y
# 1 Apple Belgium    Red Purchase  100 10000
# 2 Guava Germany  Green     Sale  200 20000
# 3 Grape   Italy Purple Purchase  500    NA

Is there a conditional join available in R that picks only the mismatches and ignores when the target column is same?

Yes, I think you could do this with non-equi joins in data.table. Or sqldf, as you mention.

I want to do this in dplyr.

dplyr only joins on equality. So you join and then filter.

Using this data:

df1 = read.table(text = "X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   100 
Guava   Germany   Green Sale       200
Grape   Italy     Purple Purchase   500
Orange India   Orange   Sale       2000", header = T)

df2 = read.table(text = "X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   10000 
Guava   Germany   Green Sale       20000
Grape   Italy     Purple Purchase   NA
Orange India   Orange   Sale       2000", header = T)

How to do a conditional join in R with dplyr?

Answers (2)

Related Questions