Reputation: 155
I have a dataset a
with 5 variables and want to filter it like this:
a1 <- a %>% filter(var_1 != 1 , var_2 != 1 , var_3 != 1 , var_4 != 1 , variable_5 != 1)
I was wondering if anything like this (pseudo code) existed:
a1 <- a %>% filter(anyvariable != 1)
In other words I would like to get rid of all the rows with value 1, no matter where it appears. 1 is just a random number. It could have been 9, 99, or whatever else! Thanks!
Upvotes: 9
Views: 16407
Reputation: 1335
@George Wood's answer works, but all_vars
has been superseded by the use of if_all
inside an existing verb, in this case filter
instead of the scoped variant filter_all
. @George Wood's answers can be updated by changing some_data %>% filter_all(all_vars(.
to some_data %>% filter(if_all(.fns = ~ .x
Upvotes: 1
Reputation: 1984
You can use filter_all
in combination with all_vars
from dplyr
, as follows:
some_data <- tibble(var1 = c("a", "b", "c"),
var2 = c(2, 4, 1),
var3 = c(1, 6, 5))
# # A tibble: 3 x 3
# var1 var2 var3
# <chr> <dbl> <dbl>
# 1 a 2.00 1.00
# 2 b 4.00 6.00
# 3 c 1.00 5.00
some_data %>% filter_all(all_vars(. != 1))
# # A tibble: 1 x 3
# var1 var2 var3
# <chr> <dbl> <dbl>
# 1 b 4.00 6.00
This will remove rows in which a variable includes 1. In the above example, this removes the first and third rows. However, be cautious with NA values:
some_data <- tibble(var1 = c("a", "b", "c"),
var2 = c(2, NA, 1),
var3 = c(1, 6, 5))
# # A tibble: 3 x 3
# var1 var2 var3
# <chr> <dbl> <dbl>
# 1 a 2.00 1.00
# 2 b NA 6.00
# 3 c 1.00 5.00
some_data %>% filter_all(all_vars(. != 1))
# # A tibble: 0 x 3
# # ... with 3 variables: var1 <chr>, var2 <dbl>, var3 <dbl>
Note that the second row does not contain a 1, but is filtered anyway. In this specific example, you can avoid such behavior by:
some_data %>% filter_all(all_vars(. != 1 | is.na(.)))
However, this may not generalize well.
Upvotes: 9
Reputation: 952
Here are some convenient functions in the form OP requested:
filter_any <- function(...,test_val,na.rm=T)
{
# JAF 20170316 filter by comparing test_val to any column, returning rows that have test_val in any column
out <- ... %>% filter(!!rowSums(.==test_val,na.rm=na.rm))
return(out)
}
filter_exclude <- function(...,test_val,na.rm=T)
{
# JAF 20170316 filter by comparing test_val to every column, excluding rows that have test_val in any column
out <- ... %>% filter(!rowSums(.==test_val,na.rm=na.rm))
return(out)
}
Here is the result on OP's test variable:
> a
Col1 Col2
1 1 1
2 0 24
3 9 1
4 0 0
> a %>% filter_exclude(test_val=1)
Col1 Col2
1 0 24
2 0 0
> a %>% filter_any(test_val=1)
Col1 Col2
1 1 1
2 9 1
>
These functions have the benefit of working without the pipe notation:
> filter_exclude(a,test_val=1)
Col1 Col2
1 0 24
2 0 0
> filter_any(a,test_val=1)
Col1 Col2
1 1 1
2 9 1
>
Upvotes: 2
Reputation: 10483
There is no filter_each
in dplyr
, so a solution based on rowSums is a viable one. Posting this very simple base option although one may prefer a filter
solution so as to incorporate the output into the dplyr
pipeline with additional operations.
set.seed(1)
df <- data.frame(x = sample(0:1, 10, replace = TRUE),
y = sample(0:1, 10, replace = TRUE))
df[rowSums(df == 1) == 0, ]
x y
1 0 0
2 0 0
Modifying 1
above to whatever value will make it work for filtering on other values. This solution is considerably faster than the apply
based filter
solution and marginally slower than dplyr
package's filter
with rowSums
.
Upvotes: 1
Reputation: 887148
We might be able to use rowSums
a %>%
filter(rowSums(. !=0) >0)
# Col1 Col2
#1 1 1
#2 0 24
#3 9 1
If I change it to !=1
a %>%
filter(rowSums(. != 1) > 0)
# Col1 Col2
#1 0 24
#2 9 1
#3 0 0
Note that this will remove the rows with all 1s. In the previous case, it removes the rows with all 0s which is consistent with what the OP mentioned in the post.
If the OP wants to remove rows with any 1 (just a number, he can use 9, or 99, or 999)
a %>%
filter(!rowSums(.==1))
# Col1 Col2
#1 0 24
#2 0 0
a <- data.frame(Col1 = c(1, 0, 9, 0), Col2 = c(1, 24, 1, 0))
Upvotes: 6
Reputation: 214957
You can try to combine with the apply
function in the pipeline:
dput(df)
structure(list(x = c(1L, 1L, 2L, 3L, 3L, 2L, 2L, 1L), y = c(1L,
2L, 2L, 1L, 1L, 2L, 3L, 3L), z = c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L)), .Names = c("x", "y", "z"), class = "data.frame", row.names = c(NA,
-8L))
df %>% filter(!apply(., 1, function(row) any(row == 1)))
x y z
1 2 2 2
2 2 3 2
Upvotes: 1