Subsetting rows containing string By Column Names - Grepl

Question

I have a dataframe like below:

There are over 200 columns and this is just a sample.

Col1    Col2    Col2-A   Col3   Col3-A
   1       3        BA      0       BA
   2       5        BA      1       NA
   3       7        BA      0       JN
   5       9        KD      1       BA
   9      10        BA      4       NA

How do I filter/subset this dataset so that based on columns that contain -A, remove rows that contain anything other than BA or NA.

Final Expected Output:

Col1    Col2    Col2-A   Col3   Col3-A
   1       3        BA      0       BA
   2       5        BA      1       NA
   9      10        BA      4       NA

Gregor Thomas · Accepted Answer

Here's a pretty simple way:

cols = grepl("-A", names(dd))
rows = rowSums(dd[, cols] == "BA" | is.na(dd[, cols])) == sum(cols)

dd[rows, ]
#   Col1 Col2 Col2-A Col3 Col3-A
# 1    1    3     BA    0     BA
# 2    2    5     BA    1   
# 5    9   10     BA    4

Using this data:

dd = read.table(header = T, text = 'Col1    Col2    Col2-A   Col3   Col3-A
   1       3        BA      0       BA
   2       5        BA      1       NA
   3       7        BA      0       JN
   5       9        KD      1       BA
   9      10        BA      4       NA', check.names = F)

Subsetting rows containing string By Column Names - Grepl

Answers (2)

Related Questions