Subset data by presence of multiple values in a single cell

Question

This is an embarrassingly simple question but I'm genuinely stuck and none of the other threads seem to address it.

I have a dataset that has over 20,000 rows, and there is one column that contains multiple codes explaining which demographic criteria the individual occupies.

Data:

ORGNAME	D_CODE
A	~001, ~002
A	~001
B	~003, ~004
B	~001, ~005
B	~002, ~004
C	~001

I want to subset the data whereby I only keep rows that contain ~001, but I want this to include rows that also contain other values (i.e. row 1 which has ~001 and ~002).

I have tried using %>%, filter, subset, etc. but although they select ~001 rows, they also remove entries that have ~001 and additional codes, so using the example data above, instead of ending up with 4 rows, I end up with only 2.

Any solutions? Thank you so much!

akrun · Accepted Answer

Using base R with grepl

subset(df, grepl('001', D_CODE))

Subset data by presence of multiple values in a single cell

Answers (2)

Related Questions