val678
val678

Reputation: 25

Subset data by presence of multiple values in a single cell

This is an embarrassingly simple question but I'm genuinely stuck and none of the other threads seem to address it.

I have a dataset that has over 20,000 rows, and there is one column that contains multiple codes explaining which demographic criteria the individual occupies.

Data:

ORGNAME D_CODE
A ~001, ~002
A ~001
B ~003, ~004
B ~001, ~005
B ~002, ~004
C ~001

I want to subset the data whereby I only keep rows that contain ~001, but I want this to include rows that also contain other values (i.e. row 1 which has ~001 and ~002).

I have tried using %>%, filter, subset, etc. but although they select ~001 rows, they also remove entries that have ~001 and additional codes, so using the example data above, instead of ending up with 4 rows, I end up with only 2.

Any solutions? Thank you so much!

Upvotes: 2

Views: 238

Answers (2)

akrun
akrun

Reputation: 887138

Using base R with grepl

subset(df, grepl('001', D_CODE))

Upvotes: 0

Karthik S
Karthik S

Reputation: 11584

Does this work:

library(dplyr)
library(stringr)
df %>% filter(str_detect(D_CODE, '001'))
  ORGNAME     D_CODE
1       A ~001, ~002
2       A       ~001
3       B ~001, ~005
4       C       ~001

Data used:

df
  ORGNAME     D_CODE
1       A ~001, ~002
2       A       ~001
3       B ~003, ~004
4       B ~001, ~005
5       B ~002, ~004
6       C       ~001

Upvotes: 1

Related Questions