Reputation: 1600
When trying to exclude specific numbers from a sequence I have noticed that which(!(0:10 %in% 2:3))
returns a different result (removes 3 and 4 and add 11) than setdiff(0:10,2:3)
or which(!(1:10 %in% 2:3))
.
which(!(1:10 %in% 2:3))
[1] 1 4 5 6 7 8 9 10
which(!(0:10 %in% 2:3))
[1] 1 2 5 6 7 8 9 10 11
setdiff(0:10,2:3)
[1] 0 1 4 5 6 7 8 9 10
This seems to be a simple logic problem but I can't figure what is this due to? Is setdiff
as fast as which(!())
for large sequences?
Upvotes: 1
Views: 544
Reputation: 1600
Summarizing the comments, which
gives the position of the elements, not the elements themselves, the example above is unclear but these examples make it obvious:
which(!(10:20 %in% 12:13))
[1] 1 2 5 6 7 8 9 10 11
0:10 %in% 2:3
[1] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
So the correct way of selecting the elements would be:
myseq<-0:10
myseq[which(!(0:10 %in% 2:3))]
Regarding speed, selecting the elements directly from the vector is slightly faster:
myseq<-0:1000000
> microbenchmark::microbenchmark(
+ wh=myseq[which(!(0:1000000 %in% 2:3))],
+ sd=setdiff(0:1000000,2:3),
+ seq=(0:1000000)[!0:1000000 %in% 2:3]
+ )
Unit: milliseconds
expr min lq mean median uq max neval cld
wh 18.62157 18.85489 25.17644 25.37830 26.89162 152.9487 100 a
sd 36.09655 42.83383 50.22088 44.16595 45.96227 178.2949 100 b
seq 17.51332 17.98346 25.00993 24.39265 25.91137 174.1875 100 a
Upvotes: 1