Maho
Maho

Reputation: 59

subset R dataframe based on matching a string across multiple columns

Apologies if this has already been asked.

Let's say I have the following dataframe

sample = c("A", "B", "C", "D", "E")
bla_1 = c("CTX-M", NA, "CTX-M", NA, NA)
bla_2 = c(NA, "CTX-M", "OXA-1", NA, NA)
bla_3 = c(NA, "OXA-1", NA, "CTX-M", "OXA-1")
MIC = c(2, 4, 8, 16, 32)

df = data.frame(sample, bla_1, bla_2, bla_3, MIC)

I want to subset "df" so that I am left with the samples which contain "CTX-M". How do I achieve this when "CTX-M" exists in the three "bla_" columns?

Upvotes: 1

Views: 49

Answers (4)

ThomasIsCoding
ThomasIsCoding

Reputation: 102625

A base R option using which with argument arr.ind = TRUE

> df[which(df == "CTX-M", arr.ind = TRUE)[, "row"], ]
  sample bla_1 bla_2 bla_3 MIC
1      A CTX-M  <NA>  <NA>   2
3      C CTX-M OXA-1  <NA>   8
2      B  <NA> CTX-M OXA-1   4
4      D  <NA>  <NA> CTX-M  16

Upvotes: 1

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21442

A base Rsolution:

df[which(apply(df, 1, function(x) any(x == "CTX-M"))), ]
  sample bla_1 bla_2 bla_3 MIC
1      A CTX-M  <NA>  <NA>   2
2      B  <NA> CTX-M OXA-1   4
3      C CTX-M OXA-1  <NA>   8
4      D  <NA>  <NA> CTX-M  16

Upvotes: 2

akrun
akrun

Reputation: 887851

We can use filter with if_any

library(dplyr)
library(stringr)
df %>%
     filter(if_any(everything(), ~ str_detect(., 'CTX-M')))

-output

#  sample bla_1 bla_2 bla_3 MIC
#1      A CTX-M  <NA>  <NA>   2
#2      B  <NA> CTX-M OXA-1   4
#3      C CTX-M OXA-1  <NA>   8
#4      D  <NA>  <NA> CTX-M  16

Or for specific columns

df %>%
    filter(if_any(bla_1:bla_3, ~ str_detect(., 'CTX-M')))

Upvotes: 1

user63230
user63230

Reputation: 4708

Is this what you are looking for?

library(tidyverse)
df %>% 
  filter_all(any_vars(str_detect(., "CTX-M")))
#   sample bla_1 bla_2 bla_3 MIC
# 1      A CTX-M  <NA>  <NA>   2
# 2      B  <NA> CTX-M OXA-1   4
# 3      C CTX-M OXA-1  <NA>   8
# 4      D  <NA>  <NA> CTX-M  16

or specifically looking at certain columns:

df %>% 
  filter_at(vars(bla_1, bla_2, bla_3), any_vars(str_detect(., "CTX-M")))
#   sample bla_1 bla_2 bla_3 MIC
# 1      A CTX-M  <NA>  <NA>   2
# 2      B  <NA> CTX-M OXA-1   4
# 3      C CTX-M OXA-1  <NA>   8
# 4      D  <NA>  <NA> CTX-M  16

Upvotes: 2

Related Questions