Reputation: 31
I am trying to use anti-join exactly as I have done many times to establish which rows across two datasets do not have matches for two specific columns. For some reason I keep getting 0 rows in the result and I can't understand why.
Below are two dummy df's containing the two columns I am trying to compare - you will see one is missing an entry (df1, SITE no2, PLOT no 8) - so when I use anti-join to compare the two dfs, this entry should be returned, but I am just getting a result of 0.
a<- seq(1:3)
SITE <- rep(a, times = c(16,15,1))
PLOT <- c(1:16,1:7,9:16,1)
df1 <- data.frame(SITE,PLOT)
SITE <- rep(a, times = c(16,16,1))
PLOT <- c(rep(1:16,2),1)
df2 <- data.frame(SITE,PLOT)
df1 df2
SITE PLOT SITE PLOT
1 1 1 1
1 2 1 2
1 3 1 3
1 4 1 4
1 5 1 5
1 6 1 6
1 7 1 7
1 9 1 8
1 10 1 9
1 11 1 10
1 12 1 11
1 13 1 12
1 14 1 13
1 15 1 14
1 16 1 15
1 1 1 16
2 2 2 1
2 3 2 2
2 4 2 3
2 5 2 4
2 6 2 5
2 7 2 6
2 8 2 7
2 9 2 8
2 10 2 9
2 11 2 10
2 12 2 11
2 13 2 12
2 14 2 13
2 15 2 14
2 16 2 15
3 1 2 16
3 1
a <- anti_join(df1, df2, by=c('SITE', 'PLOT'))
a
<0 rows> (or 0-length row.names)
I'm sure the answer is obvious but I can't see it.
Upvotes: 1
Views: 1020
Reputation:
The answer can be found in the help file.
anti_join() return all rows from x without a match in y.
So reversing the input for df1
and df2
will give you what you expect.
anti_join(df2, df1, by=c('SITE', 'PLOT'))
# SITE PLOT
# 1 2 8
Upvotes: 3