Reputation: 25
This is an embarrassingly simple question but I'm genuinely stuck and none of the other threads seem to address it.
I have a dataset that has over 20,000 rows, and there is one column that contains multiple codes explaining which demographic criteria the individual occupies.
Data:
ORGNAME | D_CODE |
---|---|
A | ~001, ~002 |
A | ~001 |
B | ~003, ~004 |
B | ~001, ~005 |
B | ~002, ~004 |
C | ~001 |
I want to subset the data whereby I only keep rows that contain ~001, but I want this to include rows that also contain other values (i.e. row 1 which has ~001 and ~002).
I have tried using %>%
, filter
, subset
, etc. but although they select ~001 rows, they also remove entries that have ~001 and additional codes, so using the example data above, instead of ending up with 4 rows, I end up with only 2.
Any solutions? Thank you so much!
Upvotes: 2
Views: 238
Reputation: 11584
Does this work:
library(dplyr)
library(stringr)
df %>% filter(str_detect(D_CODE, '001'))
ORGNAME D_CODE
1 A ~001, ~002
2 A ~001
3 B ~001, ~005
4 C ~001
Data used:
df
ORGNAME D_CODE
1 A ~001, ~002
2 A ~001
3 B ~003, ~004
4 B ~001, ~005
5 B ~002, ~004
6 C ~001
Upvotes: 1