Reputation: 717
For example, df1 looks like below -
X1 X2 X3 X4 X5
Apple Belgium Red Purchase 100
Guava Germany Green Sale 200
Grape Italy Purple Purchase 500
Orange India Orange Sale 2000
df2 looks like below -
X1 X2 X3 X4 X5
Apple Belgium Red Purchase 10000
Guava Germany Green Sale 20000
Grape Italy Purple Purchase
Orange India Orange Sale 2000
My output should look like -
X1 X2 X3 X4 X5.x X5.y
Apple Belgium Red Purchase 100 10000
Guava Germany Green Sale 200 20000
Grape Italy Purple Purchase 500 NA
Here multiple operations are involved -
Pick the rows present in 1 and not in other, vice versa
Pick the mismatches in X5 column (X5 is my target column) when the first 4 column matches
I do not want the matches.
I tried a combination of inner_join, full_join and anti_join of both to obtain the part1. How do I perform the second part? Is there a conditional join available in R that picks only the mismatches and ignores when the target column is same?
I don't want to use sqldf. I know this can be achieved in SQL. I want to do this in dplyr. Any help is much appreciated.
TIA.
Upvotes: 4
Views: 15381
Reputation: 1219
(df1
%>% anti_join(., df2, by = c("X1", "X2", "X3", "X4","X5"))
%>% left_join(., df2, by = c("X1", "X2", "X3", "X4"))
)
X1 X2 X3 X4 X5.x X5.y
1 Apple Belgium Red Purchase 100 10000
2 Guava Germany Green Sale 200 20000
3 Grape Italy Purple Purchase 500 NA
Upvotes: 1
Reputation: 146129
left_join(df1, df2, by = c("X1", "X2", "X3", "X4")) %>%
filter(X5.x != X5.y | is.na(X5.x) | is.na(X5.y))
# X1 X2 X3 X4 X5.x X5.y
# 1 Apple Belgium Red Purchase 100 10000
# 2 Guava Germany Green Sale 200 20000
# 3 Grape Italy Purple Purchase 500 NA
Is there a conditional join available in R that picks only the mismatches and ignores when the target column is same?
Yes, I think you could do this with non-equi joins in data.table
. Or sqldf
, as you mention.
I want to do this in dplyr.
dplyr
only joins on equality. So you join and then filter.
Using this data:
df1 = read.table(text = "X1 X2 X3 X4 X5
Apple Belgium Red Purchase 100
Guava Germany Green Sale 200
Grape Italy Purple Purchase 500
Orange India Orange Sale 2000", header = T)
df2 = read.table(text = "X1 X2 X3 X4 X5
Apple Belgium Red Purchase 10000
Guava Germany Green Sale 20000
Grape Italy Purple Purchase NA
Orange India Orange Sale 2000", header = T)
Upvotes: 5