dan
dan

Reputation: 6314

Removing rows from a data frame

I have this data.frame:

set.seed(1)
df <- data.frame(id1=LETTERS[sample(26,100,replace = T)],id2=LETTERS[sample(26,100,replace = T)],stringsAsFactors = F)

and this vector:

vec <- LETTERS[sample(26,10,replace = F)]

I want to remove from df any row which either df$id1 or df$id2 are not in vec

Is there any faster way of finding the row indices which meet this condition than this:

rm.idx <- which(!apply(df,1,function(x) all(x %in% vec)))

Upvotes: 0

Views: 48

Answers (3)

Mateusz1981
Mateusz1981

Reputation: 1867

I used dplyr with such script

df1 <- df %>% filter(!(df$id1 %in%  vec)|!(df$id2 %in% vec))

Upvotes: 2

dan
dan

Reputation: 6314

Actually

rm.idx <- unique(which(!(df$id1 %in% vec) | !(df$id2 %in% vec)))

is also fast.

Upvotes: 1

akrun
akrun

Reputation: 887881

Looping over the columns might be faster than over rows. So, use lapply to loop over the columns, create a list of logical vectors with %in%, use Reduce with | to check whether there are any TRUE values for each corresponding row and use that to subset the 'df'

df[Reduce(`|`, lapply(df, `%in%`, vec)),]

If we need both elements, then replace | with &

df[Reduce(`&`, lapply(df, `%in%`, vec)),]

Upvotes: 1

Related Questions