Reputation: 3195
Is there a way to filter out columns based on some condition using dplyr
? This is a bit confusing because it is the opposite of normal filtering.
I can't find anything directly applicable on SO. Found this and this but they don't do quite the same thing.
Basically, instead of filtering out rows based on a column's value, I want to remove columns based on a row's value.
Here's an example using the following data frame:
df <- data.frame(aa = c("1", "a", "10.2", "12.1", "8.7"),
ab = c("1", "b", "5.3", "8.1", "9.2"),
ac = c("0", "a", "1.8", "21.5", "16.0"),
ad = c("0", "b", "11.1", "15.9", "23.6"))
I know it's a strange data set and that the columns have data of varying types. This is actually the reason for the question. I'm trying to clean this up.
Here is a base
solution, using traditional subsetting, to this, which returns columns "ab" and "ad":
df[, df[2,] == "b"]
Is there a way to accomplish this using dplyr
? I tried using filter
, select
and subset
to no avail, but I might be using them incorrectly in this case.
Upvotes: 3
Views: 17681
Reputation: 1984
You can use select_if
which is a scoped variant of select
:
df %>%
select_if(function(x) any(x == "b"))
# ab ad
# 1 1 0
# 2 b b
# 3 5.3 11.1
# 4 8.1 15.9
# 5 9.2 23.6
Here, I supplied a function to find any column containing "b".
Edit based on your comment below:
df %>%
mutate(row_n = 1:n()) %>%
select_if(function(x) any(x == "b" & .$row_n == 2))
Here, we mutate a variable n_row
indicating the row number, then add the row number as a condition in the call to select_if
.
Upvotes: 8
Reputation: 238
You can use the following method:
df <- df %>%
select(ab, ad)
The good part about using this is that you can also do not select using the following idea:
df <- df %>%
select(-ab)
This will select all the columns but not "ab". Hope this is what you're looking for.
Upvotes: 3