Gaspare
Gaspare

Reputation: 155

Dplyr - Filter if any variable is equal to a value

I have a dataset a with 5 variables and want to filter it like this:

a1 <- a %>% filter(var_1 != 1 , var_2 != 1 , var_3 != 1 , var_4 != 1 , variable_5 != 1)

I was wondering if anything like this (pseudo code) existed:

a1 <- a %>% filter(anyvariable != 1)

In other words I would like to get rid of all the rows with value 1, no matter where it appears. 1 is just a random number. It could have been 9, 99, or whatever else! Thanks!

Upvotes: 9

Views: 16407

Answers (6)

Josh
Josh

Reputation: 1335

@George Wood's answer works, but all_vars has been superseded by the use of if_all inside an existing verb, in this case filter instead of the scoped variant filter_all. @George Wood's answers can be updated by changing some_data %>% filter_all(all_vars(. to some_data %>% filter(if_all(.fns = ~ .x

Upvotes: 1

George Wood
George Wood

Reputation: 1984

You can use filter_all in combination with all_vars from dplyr, as follows:

some_data <- tibble(var1 = c("a", "b", "c"),
                    var2 = c(2, 4, 1),
                    var3 = c(1, 6, 5))

# # A tibble: 3 x 3
#   var1   var2  var3
#   <chr> <dbl> <dbl>
# 1 a      2.00  1.00
# 2 b      4.00  6.00
# 3 c      1.00  5.00

some_data %>% filter_all(all_vars(. != 1))

# # A tibble: 1 x 3
#   var1   var2  var3
#   <chr> <dbl> <dbl>
# 1 b      4.00  6.00

This will remove rows in which a variable includes 1. In the above example, this removes the first and third rows. However, be cautious with NA values:

some_data <- tibble(var1 = c("a", "b", "c"),
                    var2 = c(2, NA, 1),
                    var3 = c(1, 6, 5))
# # A tibble: 3 x 3
#   var1   var2  var3
#   <chr> <dbl> <dbl>
# 1 a      2.00  1.00
# 2 b     NA     6.00
# 3 c      1.00  5.00

some_data %>% filter_all(all_vars(. != 1))  

# # A tibble: 0 x 3
# # ... with 3 variables: var1 <chr>, var2 <dbl>, var3 <dbl>

Note that the second row does not contain a 1, but is filtered anyway. In this specific example, you can avoid such behavior by:

some_data %>% filter_all(all_vars(. != 1 | is.na(.)))

However, this may not generalize well.

Upvotes: 9

jafelds
jafelds

Reputation: 952

Here are some convenient functions in the form OP requested:

filter_any <- function(...,test_val,na.rm=T)
{
      # JAF 20170316 filter by comparing test_val to any column, returning rows that have test_val in any column
      out <- ... %>% filter(!!rowSums(.==test_val,na.rm=na.rm))
      return(out)
}
filter_exclude <- function(...,test_val,na.rm=T)
{
      # JAF 20170316 filter by comparing test_val to every column, excluding rows that have test_val in any column
      out <- ... %>% filter(!rowSums(.==test_val,na.rm=na.rm))
      return(out)
}

Here is the result on OP's test variable:

> a
  Col1 Col2
1    1    1
2    0   24
3    9    1
4    0    0
> a %>% filter_exclude(test_val=1)
  Col1 Col2
1    0   24
2    0    0
> a %>% filter_any(test_val=1)
  Col1 Col2
1    1    1
2    9    1
>

These functions have the benefit of working without the pipe notation:

> filter_exclude(a,test_val=1)
  Col1 Col2
1    0   24
2    0    0
> filter_any(a,test_val=1)
  Col1 Col2
1    1    1
2    9    1
>

Upvotes: 2

Gopala
Gopala

Reputation: 10483

There is no filter_each in dplyr, so a solution based on rowSums is a viable one. Posting this very simple base option although one may prefer a filter solution so as to incorporate the output into the dplyr pipeline with additional operations.

set.seed(1)
df <- data.frame(x = sample(0:1, 10, replace = TRUE),
                 y = sample(0:1, 10, replace = TRUE))
df[rowSums(df == 1) == 0, ]
  x y
1 0 0
2 0 0

Modifying 1 above to whatever value will make it work for filtering on other values. This solution is considerably faster than the apply based filter solution and marginally slower than dplyr package's filter with rowSums.

Upvotes: 1

akrun
akrun

Reputation: 887148

We might be able to use rowSums

a %>% 
  filter(rowSums(. !=0) >0)
#    Col1 Col2
#1    1    1
#2    0   24
#3    9    1

If I change it to !=1

a %>% 
   filter(rowSums(. != 1) > 0)
#   Col1 Col2
#1    0   24
#2    9    1
#3    0    0

Note that this will remove the rows with all 1s. In the previous case, it removes the rows with all 0s which is consistent with what the OP mentioned in the post.

Update

If the OP wants to remove rows with any 1 (just a number, he can use 9, or 99, or 999)

a %>% 
   filter(!rowSums(.==1))
#    Col1 Col2
#1    0   24
#2    0    0

data

a <- data.frame(Col1 = c(1, 0, 9, 0), Col2 = c(1, 24, 1, 0))

Upvotes: 6

akuiper
akuiper

Reputation: 214957

You can try to combine with the apply function in the pipeline:

dput(df)
structure(list(x = c(1L, 1L, 2L, 3L, 3L, 2L, 2L, 1L), y = c(1L, 
2L, 2L, 1L, 1L, 2L, 3L, 3L), z = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L)), .Names = c("x", "y", "z"), class = "data.frame", row.names = c(NA, 
-8L))

df %>% filter(!apply(., 1, function(row) any(row == 1)))
  x y z
1 2 2 2
2 2 3 2

Upvotes: 1

Related Questions