Select/remove columns under conditions in dplyr (filter columns) R

Question

Is there a way to filter out columns based on some condition using dplyr? This is a bit confusing because it is the opposite of normal filtering.

I can't find anything directly applicable on SO. Found this and this but they don't do quite the same thing.

Basically, instead of filtering out rows based on a column's value, I want to remove columns based on a row's value.

Here's an example using the following data frame:

df <- data.frame(aa = c("1", "a", "10.2", "12.1", "8.7"), 
                 ab = c("1", "b", "5.3", "8.1", "9.2"), 
                 ac = c("0", "a", "1.8", "21.5", "16.0"), 
                 ad = c("0", "b", "11.1", "15.9", "23.6"))

I know it's a strange data set and that the columns have data of varying types. This is actually the reason for the question. I'm trying to clean this up.

Here is a base solution, using traditional subsetting, to this, which returns columns "ab" and "ad":

df[, df[2,] == "b"]

Is there a way to accomplish this using dplyr? I tried using filter, select and subset to no avail, but I might be using them incorrectly in this case.

George Wood · Accepted Answer

You can use select_if which is a scoped variant of select:

df %>%
  select_if(function(x) any(x == "b"))

#    ab   ad
# 1   1    0
# 2   b    b
# 3 5.3 11.1
# 4 8.1 15.9
# 5 9.2 23.6

Here, I supplied a function to find any column containing "b".

Edit based on your comment below:

df %>%
  mutate(row_n = 1:n()) %>%
  select_if(function(x) any(x == "b" & .$row_n == 2))

Here, we mutate a variable n_row indicating the row number, then add the row number as a condition in the call to select_if.

Select/remove columns under conditions in dplyr (filter columns) R

Answers (2)

Related Questions