Reputation: 5169
I have the following data frame:
dat <- structure(list(`A-XXX` = c(1.51653275922944, 0.077037240321129,
0), `fBM-XXX` = c(2.22875185527511, 0, 0), `P-XXX` = c(1.73356698481106,
0, 0), `vBM-XXX` = c(3.00397859609183, 0, 0)), .Names = c("A-XXX",
"fBM-XXX", "P-XXX", "vBM-XXX"), row.names = c("BATF::JUN_AHR",
"BATF::JUN_CCR9", "BATF::JUN_IL10"), class = "data.frame")
dat
#> A-XXX fBM-XXX P-XXX vBM-XXX
#> BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#> BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
#> BATF::JUN_IL10 0.00000000 0.000000 0.000000 0.000000
I can remove the row with all column zero with this command:
> dat <- dat[ rowSums(dat)!=0, ]
> dat
A-XXX fBM-XXX P-XXX vBM-XXX
BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
But how can I do it with dplyr's pipe style?
Upvotes: 12
Views: 17163
Reputation: 7151
With the latest tidyverse packages, across() in filter() is deprecated
. The updated solution now is:
data %>% filter(if_all(everything(.), ~. != 0))
Adding to the answer by @mgrund, a shorter alternative with dplyr 1.0.0 is:
# Option A:
data %>% filter(across(everything(.)) != 0))
# Option B:
data %>% filter(across(everything(.), ~. != 0))
Explanation:
across()
checks for every tidy_select variable, which is everything()
representing every column. In Option A, every column is checked if not zero, which adds up to a complete row of zeros in every column. In Option B, on every column, the formula (~) is applied which checks if the current column is zero.
EDIT:
As filter
already checks by row, you don't need rowwise()
. This is different for select
or mutate
.
IMPORTANT:
In Option A, it is crucial to write across(everything(.)) != 0
,
and NOT
across(everything(.) != 0))
!
Reason:
across
requires a tidyselect variable (here everything()
), not a boolean (which would be everything(.) != 0)
)
Upvotes: 4
Reputation: 848
You can use the new if_any()
. I tailored an example found in the documentation of if_any()
library(dplyr)
library(tibble)
dat <- structure(list(`A-XXX` = c(1.51653275922944, 0.077037240321129,
0), `fBM-XXX` = c(2.22875185527511, 0, 0), `P-XXX` = c(1.73356698481106,
0, 0), `vBM-XXX` = c(3.00397859609183, 0, 0)), .Names = c("A-XXX",
"fBM-XXX", "P-XXX", "vBM-XXX"), row.names = c("BATF::JUN_AHR",
"BATF::JUN_CCR9", "BATF::JUN_IL10"), class = "data.frame")
dat
#> A-XXX fBM-XXX P-XXX vBM-XXX
#> BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#> BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
#> BATF::JUN_IL10 0.00000000 0.000000 0.000000 0.000000
dat %>%
rownames_to_column("ID") %>%
filter(if_any(!matches("ID"), ~ . != 0)) %>%
column_to_rownames("ID")
#> A-XXX fBM-XXX P-XXX vBM-XXX
#> BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#> BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
Created on 2021-04-12 by the reprex package (v1.0.0)
Upvotes: 4
Reputation: 1625
Here's another option using the row-wise operations of dplyr (with col1,col2,col3
defining three exemplary columns for which the rowwise sum is calculated):
library(tidyverse)
df <- df %>%
rowwise() %>%
filter(sum(c(col1,col2,col3)) != 0)
Alternatively, if you have tons of variables (columns) to select you can also use the tidyverse selection syntax via:
df <- df %>%
rowwise() %>%
filter(sum(c_across(col1:col3)) != 0)
For details see: https://dplyr.tidyverse.org/articles/rowwise.html
Upvotes: 1
Reputation: 15072
Here is a third option that uses purrr::pmap
to generate the indices of whether or not all rows are zero. Definitely less compact than filter_at
, but opens up options for interesting and complex conditions using pmap
!
dat <- structure(list(`A-XXX` = c(1.51653275922944, 0.077037240321129,
0), `fBM-XXX` = c(2.22875185527511, 0, 0), `P-XXX` = c(1.73356698481106,
0, 0), `vBM-XXX` = c(3.00397859609183, 0, 0)), .Names = c("A-XXX",
"fBM-XXX", "P-XXX", "vBM-XXX"), row.names = c("BATF::JUN_AHR",
"BATF::JUN_CCR9", "BATF::JUN_IL10"), class = "data.frame")
library(tidyverse)
dat %>%
rownames_to_column() %>%
bind_cols(all_zero = pmap_lgl(., function(rowname, ...) all(list(...) == 0))) %>%
filter(all_zero == FALSE) %>%
`rownames<-`(.$rowname) %>%
select(-rowname, -all_zero)
#> A-XXX fBM-XXX P-XXX vBM-XXX
#> BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#> BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
Created on 2018-03-14 by the reprex package (v0.2.0).
Upvotes: 1
Reputation: 70266
Here's a dplyr option:
library(dplyr)
filter_all(dat, any_vars(. != 0))
# A-XXX fBM-XXX P-XXX vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000
Here we make use of the logic that if any variable is not equal to zero, we will keep it. It's the same as removing rows where all variables are equal to zero.
Regarding row.names:
library(tidyverse)
dat %>% rownames_to_column() %>% filter_at(vars(-rowname), any_vars(. != 0))
# rowname A-XXX fBM-XXX P-XXX vBM-XXX
#1 BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#2 BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
Upvotes: 12
Reputation: 887098
We could use reduce
from purrr
to get the sum of rows and filter
the dataset based on the logical vector
library(tidyverse)
dat %>%
reduce(`+`) %>%
{. != 0} %>%
filter(dat, .)
# A-XXX fBM-XXX P-XXX vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000
NOTE: Within the %>%
, the row.names gets stripped off. It may be better to create a new column or assign row.names later
If we need the row names as well, then create a row names column early and then use that to change the row names at the end
dat %>%
rownames_to_column('rn') %>%
filter(rowSums(.[-1]) != 0) %>%
`row.names<-`(., .[['rn']]) %>% select(-rn)
# A-XXX fBM-XXX P-XXX vBM-XXX
#BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
Upvotes: 3