Reputation: 621

Finding elements in a vector that are duplicated or that are not in another vector

I have the following situation:

vec1  <- c("A", "B", "D", "C", "E", "A", "C")
vec2 <- c("A", "B", "C", "D", "F")

First question: which one is duplicated ? - answer "A" and "C" for vec1, 0 for vec2

Second question: Identify which is vec1 but not in vec2, irrespective of order (answer "E")

or vice versa (answer "F")

which(vec1 !=vec2)
which(vec2 !=vec1)

[1] 3 4 5 7
Warning message:
In vec1 != vec2 :
  longer object length is not a multiple of shorter object length

which is not what I expected....

Upvotes: 5

Answers (3)

sebastian-c

Reputation: 15395

For the first question, try ?duplicated

vec1.dup <- duplicated(vec1)
unique(vec1[vec1.dup])

[1] "A" "C"

For the second, try ?setdiff. You want the values of vec2 that are not in vec1.

setdiff(vec2, vec1)
[1] "F"

Upvotes: 5

IRTFM

Reputation: 263342

It appears that your (second) question is ..Why? ( I do see that you have gotten good answers to the correct ... How? )

which(vec1 !=vec2)
which(vec2 !=vec1)

Both return

[1] 3 4 5 7

The answer lies in major part in the warning message you did not include:

Warning message:
In vec1 != vec2 :
  longer object length is not a multiple of shorter object length

When dyadic operators like "!=" work on vectors, the recycling rules take over so the longer of the two vectors determines the "range" for the comparisons, and the shorter one gets extended by recycling. You end up testing:

> c("A", "B", "C", "D", "F", "A", "B") != c("A", "B", "D", "C", "E", "A", "C")
                                         #.... extending shorter one ^^^^^^^
[1] FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE
> c("A", "B", "D", "C", "E","A", "C") !=  c("A", "B", "C", "D", "F", "A", "B")
#.... extending shorter one ^^^^^^^
[1] FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE

Upvotes: 3

mindless.panda

Reputation: 4092

Elements in vec1 that are duplicated:

vec1[duplicated(vec1)]

[1] "A" "C"

Elements in vec1 that are not in vec2:

vec1[is.na(match(vec1,vec2))]

[1] "E"

And vice versa:

vec2[is.na(match(vec1,vec2))]

[1] "F"

Upvotes: 3

Finding elements in a vector that are duplicated or that are not in another vector

Answers (3)

Related Questions