Reputation: 341
I'd like to remove all rows that sum to 0, but I have factor columns in the first 2 columns. I've come up with a dplyr solution, creating an intermediate rowsum column, filtering out rows that sum to 0, then removing that rowsum column.
I'd like to find a way for this to work without creating that unnecessary rowsum column, both using base R and a dplyr/tidyverse pipe-friendly method. Surely there's an easy, one-line piece of code to make this happen?
library(tidyverse)
df <- data.frame(person = rep(c("Ed", "Sue"), 6),
id = paste0("plot",1:12),
a = c(2, 0, 0, 0, 0, 1, 0, 0, 4, 0, 0, 0),
b = c(0, 0, 6, 4, 0, 8, 1, 0, 0, 0, 1, 1),
c = c(4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 8),
d = c(0, 0, 0, 3, 0, 1, 0, 0, 9, 0, 1, 5),
e = c(7, 0, 5, 0, 0, 1, 0, 0, 0, 0, 7, 0))
##create intermediate 'row.sum' column, filter rows that have all 0's, then remove row.sum column
df1 <- df %>%
dplyr::mutate(row.sum = a+b+c+d+e) %>%
dplyr::filter(row.sum != 0) %>%
dplyr::select(-row.sum)
#end result:
# person id a b c d e
#1 Ed plot1 2 0 4 0 7
#2 Ed plot3 0 6 0 0 5
#3 Sue plot4 0 4 0 3 0
#4 Sue plot6 1 8 0 1 1
#5 Ed plot7 0 1 0 0 0
#6 Ed plot9 4 0 0 9 0
#7 Ed plot11 0 1 3 1 7
#8 Sue plot12 0 1 8 5 0
Upvotes: 4
Views: 4997
Reputation: 9878
A dplyr method
You can apply rowSums to numeric columns only, using dplyrs filter()
and across()
, with the helper where(is.numeric)
:
library(dplyr)
df%>%filter(rowSums(across(where(is.numeric)))!=0)
person id a b c d e
1 Ed plot1 2 0 4 0 7
2 Ed plot3 0 6 0 0 5
3 Sue plot4 0 4 0 3 0
4 Sue plot6 1 8 0 1 1
5 Ed plot7 0 1 0 0 0
6 Ed plot9 4 0 0 9 0
7 Ed plot11 0 1 3 1 7
8 Sue plot12 0 1 8 5 0
This method (and some of those that depend on rowSums()
) can fail if your numeric columns have negative values as well.
In that case we must make sure that we keep only the rows that contain at least any()
non-zero values. This can be done by modifying the rowSums()
to include the condition .x!=0
inside across()
:
df%>%filter(rowSums(across(where(is.numeric), ~.x!=0))>0)
Or with logical operators and Reduce()/reduce()
, with the following code:
library(dplyr)
library(purrr)
df%>%filter(pmap_lgl(select(., where(is.numeric)), ~any(c(...)!=0)))
#or with purrr:reduce()#
df%>%filter(across(where(is.numeric), ~.x!=0)%>%reduce(`|`))
#or simply
df%>%filter(reduce(across(where(is.numeric), ~.x!=0), `|`))
a base R method
You can use base subsetting with [
, with sapply(f, is.numeric)
to create a logical index to select only numerical columns to feed to the inequality operator !=
, then take the rowSums()
of the final logical matrix that is created and select only rows in which the rowSums is >0:
df[rowSums(df[,sapply(df, is.numeric)]!=0)>0,]
EDIT
We can benefit from the coercion that comes from calling logical functions on numeric vectors. as.logical()
will evaluate zeroes to FALSE and any non-zero numbers to TRUE. x|x
and nested bang signs !(!)
will do the same. This is consistent with the other solutions that compare elements to ZERO, and is therefore more consistent than the rowSums
solution.
An example:
vector<-c(0,1,2,-1)
identical(as.logical(vector), vector|vector, vector!=0, !(!vector))
[1] TRUE
There are some neat ways to solve this with that in mind:
df%>%filter(reduce(across(where(is.numeric), as.logical), `|`))
#or simply
df%>%filter(reduce(across(where(is.numeric)), `|`))
#and with base R:
df[Reduce(`|`, df[sapply(df, is.numeric)]),]
And the cleanest so far, with the new if_any()
:
df%>%filter(if_any(where(is.numeric)))
Upvotes: 10
Reputation: 263481
This is kind of mixture of base logic and tidy syntax. It's admittedly a bit tortured to get around the alternate syntax that is rlang.
df1 <- df %>% filter(!(!rowSums(`[`(.,,3:7))))
> df1
person id a b c d e
1 Ed plot1 2 0 4 0 7
2 Ed plot3 0 6 0 0 5
3 Sue plot4 0 4 0 3 0
4 Sue plot6 1 8 0 1 1
5 Ed plot7 0 1 0 0 0
6 Ed plot9 4 0 0 9 0
7 Ed plot11 0 1 3 1 7
8 Sue plot12 0 1 8 5 0
Need to separate the two exclamation points with a paren, because !!
is a different operation in "rlang" which is the logical environment for filter
. (And note that rwoSums does have an na.rm
argument if there are possibly NA's and those should be ignored.
And here's a base solution. Was !!
working as a substitute for != 0
df1 <- df[ as.logical(rowSums(df[3:7]) ), ]
(I think it's probably better to stick with != 0
)
Upvotes: 2
Reputation: 79276
We could calculate the row sums row wise and use slice
library(dplyr)
df %>%
rowwise() %>%
slice(unique(c(which(sum(c_across(where(is.numeric))) != 0))))
Output:
person id a b c d e
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Ed plot1 2 0 4 0 7
2 Ed plot3 0 6 0 0 5
3 Sue plot4 0 4 0 3 0
4 Sue plot6 1 8 0 1 1
5 Ed plot7 0 1 0 0 0
6 Ed plot9 4 0 0 9 0
7 Ed plot11 0 1 3 1 7
8 Sue plot12 0 1 8 5 0
Upvotes: 1
Reputation: 26238
Alternative without using rowSums
all
inside rowwise
filter
converting desired vals into logical testing == 0
with cur_data()
library(dplyr)
df %>% rowwise() %>%
filter(!all(cur_data()[-c(1:2)] == 0))
#> # A tibble: 8 x 7
#> # Rowwise:
#> person id a b c d e
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Ed plot1 2 0 4 0 7
#> 2 Ed plot3 0 6 0 0 5
#> 3 Sue plot4 0 4 0 3 0
#> 4 Sue plot6 1 8 0 1 1
#> 5 Ed plot7 0 1 0 0 0
#> 6 Ed plot9 4 0 0 9 0
#> 7 Ed plot11 0 1 3 1 7
#> 8 Sue plot12 0 1 8 5 0
Created on 2021-05-30 by the reprex package (v2.0.0)
Upvotes: 3
Reputation: 11546
Does this work:
subset(df, rowSums(df[3:7]) != 0)
person id a b c d e
1 Ed plot1 2 0 4 0 7
3 Ed plot3 0 6 0 0 5
4 Sue plot4 0 4 0 3 0
6 Sue plot6 1 8 0 1 1
7 Ed plot7 0 1 0 0 0
9 Ed plot9 4 0 0 9 0
11 Ed plot11 0 1 3 1 7
12 Sue plot12 0 1 8 5 0
Upvotes: 2