Reputation: 4243
I have a dataframe like below:
There are over 200 columns and this is just a sample.
Col1 Col2 Col2-A Col3 Col3-A
1 3 BA 0 BA
2 5 BA 1 NA
3 7 BA 0 JN
5 9 KD 1 BA
9 10 BA 4 NA
How do I filter/subset this dataset so that based on columns that contain -A
, remove rows that contain anything other than BA
or NA
.
Final Expected Output:
Col1 Col2 Col2-A Col3 Col3-A
1 3 BA 0 BA
2 5 BA 1 NA
9 10 BA 4 NA
Upvotes: 0
Views: 410
Reputation: 39858
With dplyr
you can do:
df %>%
filter_at(vars(contains(".A")), all_vars(grepl("BA", .) | is.na(.)))
Col1 Col2 Col2.A Col3 Col3.A
1 1 3 BA 0 BA
2 2 5 BA 1 <NA>
3 9 10 BA 4 <NA>
It filters based on variables that contains ".A" and keeps the rows where all variables are "BA" or NA.
Or a simplified version based on a post from @Gregor:
df %>%
filter_at(vars(contains(".A")), all_vars(. == "BA" | is.na(.)))
Sample data:
df <- read.table(text = "Col1 Col2 Col2-A Col3 Col3-A
1 3 BA 0 BA
2 5 BA 1 NA
3 7 BA 0 JN
5 9 KD 1 BA
9 10 BA 4 NA", header = TRUE, stringsAsFactors = FALSE)
Upvotes: 1
Reputation: 145755
Here's a pretty simple way:
cols = grepl("-A", names(dd))
rows = rowSums(dd[, cols] == "BA" | is.na(dd[, cols])) == sum(cols)
dd[rows, ]
# Col1 Col2 Col2-A Col3 Col3-A
# 1 1 3 BA 0 BA
# 2 2 5 BA 1 <NA>
# 5 9 10 BA 4 <NA>
Using this data:
dd = read.table(header = T, text = 'Col1 Col2 Col2-A Col3 Col3-A
1 3 BA 0 BA
2 5 BA 1 NA
3 7 BA 0 JN
5 9 KD 1 BA
9 10 BA 4 NA', check.names = F)
Upvotes: 1