Reputation: 109
I have two vectors
a <- c(1:20)
b <- c(2,11,14)
I want to delete the entries in the a vector based on the vector entries in b (I want the 2nd, 11th, and 14th entries deleted).
I've tried several methods, including:
c <- a[!a %in% b]
but that doesn't work.
Any suggestions? I've tried searching SO, but can only find deleting based on values.
Upvotes: 3
Views: 114
Reputation: 10483
You can simply index into a
and remove the elements at indices in b
as follows:
a <- c(1:20)
b <- c(2,11,14)
a[-b]
[1] 1 3 4 5 6 7 8 9 10 12 13 15 16 17 18 19 20
I created 3.1 million entries and am randomly sampling 100,000 to remove. As can be seen, it is blazing fast.
a <- 1:3100000
b <- sample(a, 100000)
system.time(a[-b])
user system elapsed
0.024 0.003 0.027
Edited: Adding this extra check option based on comment below by akrun and thelatemail to handle the case where b
might be null.
a[if(length(b)) -b else TRUE]
Upvotes: 5
Reputation: 887118
The approach by @Gopala works in most cases except when the 'b' vector is NULL. To make it a bit more general, we can get the logical condition using seq_along(a)
with %in%
a[!seq_along(a) %in% b]
#[1] 1 3 4 5 6 7 8 9 10 12 13 15 16 17 18 19 20
Now, if we change 'b' to
b <- vector('integer')
a[-b]
#integer(0)
a[!seq_along(a) %in% b]
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
The former returns a vector of length 0, while the %in%
approach returns the whole vector 'a'.
Other method is obviously more efficient, but in case if we need an approach that works on the case I mentioned, this can be used.
system.time(a[-b])
# user system elapsed
# 0.07 0.00 0.08
system.time(a[!seq_along(a) %in% b])
# user system elapsed
# 0.17 0.01 0.18
The approach posted by @thelatemail to make the first approach general
system.time(a[if(length(b)==0) TRUE else -b])
# user system elapsed
# 0.05 0.00 0.05
NOTE: Benchmark data from @Gopala's post.
Upvotes: 3