Reputation: 6314
I have this data.frame
:
set.seed(1)
df <- data.frame(id1=LETTERS[sample(26,100,replace = T)],id2=LETTERS[sample(26,100,replace = T)],stringsAsFactors = F)
and this vector
:
vec <- LETTERS[sample(26,10,replace = F)]
I want to remove from df
any row which either df$id1
or df$id2
are not in vec
Is there any faster way of finding the row indices which meet this condition than this:
rm.idx <- which(!apply(df,1,function(x) all(x %in% vec)))
Upvotes: 0
Views: 48
Reputation: 1867
I used dplyr
with such script
df1 <- df %>% filter(!(df$id1 %in% vec)|!(df$id2 %in% vec))
Upvotes: 2
Reputation: 6314
Actually
rm.idx <- unique(which(!(df$id1 %in% vec) | !(df$id2 %in% vec)))
is also fast.
Upvotes: 1
Reputation: 887881
Looping over the columns might be faster than over rows. So, use lapply
to loop over the columns, create a list
of logical vector
s with %in%
, use Reduce
with |
to check whether there are any TRUE values for each corresponding row and use that to subset the 'df'
df[Reduce(`|`, lapply(df, `%in%`, vec)),]
If we need both elements, then replace |
with &
df[Reduce(`&`, lapply(df, `%in%`, vec)),]
Upvotes: 1