Reputation: 621
I have the following situation:
vec1 <- c("A", "B", "D", "C", "E", "A", "C")
vec2 <- c("A", "B", "C", "D", "F")
First question: which one is duplicated ? - answer "A" and "C" for vec1, 0 for vec2
Second question: Identify which is vec1 but not in vec2, irrespective of order (answer "E")
or vice versa (answer "F")
which(vec1 !=vec2)
which(vec2 !=vec1)
[1] 3 4 5 7
Warning message:
In vec1 != vec2 :
longer object length is not a multiple of shorter object length
which is not what I expected....
Upvotes: 5
Views: 4422
Reputation: 15395
For the first question, try ?duplicated
vec1.dup <- duplicated(vec1)
unique(vec1[vec1.dup])
[1] "A" "C"
For the second, try ?setdiff
. You want the values of vec2 that are not in vec1.
setdiff(vec2, vec1)
[1] "F"
Upvotes: 5
Reputation: 263342
It appears that your (second) question is ..Why? ( I do see that you have gotten good answers to the correct ... How? )
which(vec1 !=vec2)
which(vec2 !=vec1)
Both return
[1] 3 4 5 7
The answer lies in major part in the warning message you did not include:
Warning message:
In vec1 != vec2 :
longer object length is not a multiple of shorter object length
When dyadic operators like "!=" work on vectors, the recycling rules take over so the longer of the two vectors determines the "range" for the comparisons, and the shorter one gets extended by recycling. You end up testing:
> c("A", "B", "C", "D", "F", "A", "B") != c("A", "B", "D", "C", "E", "A", "C")
#.... extending shorter one ^^^^^^^
[1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE
> c("A", "B", "D", "C", "E","A", "C") != c("A", "B", "C", "D", "F", "A", "B")
#.... extending shorter one ^^^^^^^
[1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE
Upvotes: 3
Reputation: 4092
Elements in vec1 that are duplicated:
vec1[duplicated(vec1)]
[1] "A" "C"
Elements in vec1 that are not in vec2:
vec1[is.na(match(vec1,vec2))]
[1] "E"
And vice versa:
vec2[is.na(match(vec1,vec2))]
[1] "F"
Upvotes: 3