KGB91
KGB91

Reputation: 679

Subset multiple columns in R with multiple matches

I want to do a similar thing as in this thread: Subset multiple columns in R - more elegant code?

I have data that looks like this:

df=data.frame(x=1:4,Col1=c("A","A","C","B"),Col2=c("A","B","B","A"),Col3=c("A","C","C","A"))
criteria="A"

What I want to do is to subset the data where criteria is meet in at least two columns, that is the string in at least two of the three columns is A. In the case above, the subset would be the first and last row of the data frame df.

Upvotes: 0

Views: 372

Answers (2)

akrun
akrun

Reputation: 887048

We can use subset with apply

subset(df, apply(df[-1] == criteria, 1, sum) >1)
#   x Col1 Col2 Col3
#1 1    A    A    A
#4 4    B    A    A

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388907

You can use rowSums :

df[rowSums(df[-1] == criteria) >= 2, ]

#  x Col1 Col2 Col3
#1 1    A    A    A
#4 4    B    A    A

If criteria is of length > 1 you cannot use == directly in which case use sapply with %in%.

df[rowSums(sapply(df[-1], `%in%`, criteria)) >= 2, ]

In dplyr you can use filter with rowwise :

library(dplyr)
df %>%
  rowwise() %>%
  filter(sum(c_across(starts_with('col')) %in% criteria) >= 2)

Upvotes: 1

Related Questions