R. Martin
R. Martin

Reputation: 109

delete vector entries based on another vector

I have two vectors

a <- c(1:20)
b <- c(2,11,14)

I want to delete the entries in the a vector based on the vector entries in b (I want the 2nd, 11th, and 14th entries deleted).

I've tried several methods, including:

c <- a[!a %in% b]

but that doesn't work.

Any suggestions? I've tried searching SO, but can only find deleting based on values.

Upvotes: 3

Views: 114

Answers (2)

Gopala
Gopala

Reputation: 10483

You can simply index into a and remove the elements at indices in b as follows:

a <- c(1:20)
b <- c(2,11,14)
a[-b]
 [1]  1  3  4  5  6  7  8  9 10 12 13 15 16 17 18 19 20

I created 3.1 million entries and am randomly sampling 100,000 to remove. As can be seen, it is blazing fast.

a <- 1:3100000
b <- sample(a, 100000)
system.time(a[-b])
   user  system elapsed 
  0.024   0.003   0.027 

Edited: Adding this extra check option based on comment below by akrun and thelatemail to handle the case where b might be null.

a[if(length(b)) -b else TRUE]

Upvotes: 5

akrun
akrun

Reputation: 887118

The approach by @Gopala works in most cases except when the 'b' vector is NULL. To make it a bit more general, we can get the logical condition using seq_along(a) with %in%

a[!seq_along(a) %in% b]
#[1]  1  3  4  5  6  7  8  9 10 12 13 15 16 17 18 19 20

Now, if we change 'b' to

b <- vector('integer')
a[-b]
#integer(0)
a[!seq_along(a) %in% b]
#[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

The former returns a vector of length 0, while the %in% approach returns the whole vector 'a'.

Other method is obviously more efficient, but in case if we need an approach that works on the case I mentioned, this can be used.

system.time(a[-b])
# user  system elapsed 
#  0.07    0.00    0.08 
system.time(a[!seq_along(a) %in% b])
#  user  system elapsed 
#  0.17    0.01    0.18 

The approach posted by @thelatemail to make the first approach general

system.time(a[if(length(b)==0) TRUE else -b])
# user  system elapsed 
#  0.05    0.00    0.05 

NOTE: Benchmark data from @Gopala's post.

Upvotes: 3

Related Questions