Reputation: 95
As I understand it, setdiff()
compares two vectors and gives the elements that occur in one vector but do not occur in the other. If that's so, then given these vectors...
thing1 <- c(1,2,3)
thing2 <- c(2,3,4)
thing3 <- c(1,2,3)
...here's my results.
setdiff(thing1,thing2)
> [1] 1
setdiff(thing2,thing3)
> [1] 4
setdiff(thing1,thing3)
> numeric(0)
Shouldn't the comparison of thing1
and thing2
produce the same result as comparing thing2
and thing3
? How to achieve an 'outer join' sort of result (symmetric set difference) where we can see all the elements that are missing if we unioned thing1
and thing2
? Prefer to know functionality in R base, but would also appreciate data.tables
approach. Thanks in advance.
Upvotes: 8
Views: 9676
Reputation: 15395
setdiff
provides asymmetric difference. In this case, it does what it says on the tin.
Shouldn't the comparison of
thing1
andthing2
produce the same result as comparingthing2
andthing3
?
Well, no. But it will produce the same results as comparing thing3
and thing2
. The order matters. Consider your first two examples:
The first example asks, what is in thing1
that is not in thing2
?
> setdiff(thing1, thing2)
[1] 1
You could try the reverse, what is in thing2
that is not in thing1
?
> setdiff(thing2, thing1)
[1] 4
But it looks to me like the question you're asking is:
What elements of
thing1
andthing2
are not shared?
Which is the same as:
What elements are in the union of
thing1
andthing2
, but not in the intersection of the two?
> setdiff(union(thing1, thing2), intersect(thing1, thing2))
[1] 1 4
Upvotes: 20