E. Moore
E. Moore

Reputation: 341

R - Remove rows from dataframe that contain only zeros in numeric columns, base R and pipe-friendly methods?

I'd like to remove all rows that sum to 0, but I have factor columns in the first 2 columns. I've come up with a dplyr solution, creating an intermediate rowsum column, filtering out rows that sum to 0, then removing that rowsum column.

I'd like to find a way for this to work without creating that unnecessary rowsum column, both using base R and a dplyr/tidyverse pipe-friendly method. Surely there's an easy, one-line piece of code to make this happen?

library(tidyverse)

df <- data.frame(person = rep(c("Ed", "Sue"), 6),
                id = paste0("plot",1:12),
                a = c(2, 0, 0, 0, 0, 1, 0, 0, 4, 0, 0, 0),
                b = c(0, 0, 6, 4, 0, 8, 1, 0, 0, 0, 1, 1),
                c = c(4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 8),
                d = c(0, 0, 0, 3, 0, 1, 0, 0, 9, 0, 1, 5),
                e = c(7, 0, 5, 0, 0, 1, 0, 0, 0, 0, 7, 0))


##create intermediate 'row.sum' column, filter rows that have all 0's, then remove row.sum column
df1 <- df %>% 
  dplyr::mutate(row.sum = a+b+c+d+e) %>% 
  dplyr::filter(row.sum != 0) %>% 
  dplyr::select(-row.sum)


#end result:
#  person     id a b c d e
#1     Ed  plot1 2 0 4 0 7
#2     Ed  plot3 0 6 0 0 5
#3    Sue  plot4 0 4 0 3 0
#4    Sue  plot6 1 8 0 1 1
#5     Ed  plot7 0 1 0 0 0
#6     Ed  plot9 4 0 0 9 0
#7     Ed plot11 0 1 3 1 7
#8    Sue plot12 0 1 8 5 0

Upvotes: 4

Views: 4997

Answers (5)

GuedesBF
GuedesBF

Reputation: 9878

A dplyr method

You can apply rowSums to numeric columns only, using dplyrs filter() and across(), with the helper where(is.numeric):

library(dplyr)

df%>%filter(rowSums(across(where(is.numeric)))!=0)

  person     id a b c d e
1     Ed  plot1 2 0 4 0 7
2     Ed  plot3 0 6 0 0 5
3    Sue  plot4 0 4 0 3 0
4    Sue  plot6 1 8 0 1 1
5     Ed  plot7 0 1 0 0 0
6     Ed  plot9 4 0 0 9 0
7     Ed plot11 0 1 3 1 7
8    Sue plot12 0 1 8 5 0

This method (and some of those that depend on rowSums()) can fail if your numeric columns have negative values as well. In that case we must make sure that we keep only the rows that contain at least any()non-zero values. This can be done by modifying the rowSums() to include the condition .x!=0inside across():

df%>%filter(rowSums(across(where(is.numeric), ~.x!=0))>0)

Or with logical operators and Reduce()/reduce(), with the following code:

library(dplyr)
library(purrr)

df%>%filter(pmap_lgl(select(., where(is.numeric)), ~any(c(...)!=0)))

#or with purrr:reduce()#

df%>%filter(across(where(is.numeric), ~.x!=0)%>%reduce(`|`))
#or simply
df%>%filter(reduce(across(where(is.numeric), ~.x!=0), `|`))

a base R method

You can use base subsetting with [, with sapply(f, is.numeric) to create a logical index to select only numerical columns to feed to the inequality operator !=, then take the rowSums() of the final logical matrix that is created and select only rows in which the rowSums is >0:

df[rowSums(df[,sapply(df, is.numeric)]!=0)>0,]

EDIT

We can benefit from the coercion that comes from calling logical functions on numeric vectors. as.logical() will evaluate zeroes to FALSE and any non-zero numbers to TRUE. x|x and nested bang signs !(!) will do the same. This is consistent with the other solutions that compare elements to ZERO, and is therefore more consistent than the rowSumssolution.

An example:

vector<-c(0,1,2,-1)
identical(as.logical(vector), vector|vector, vector!=0, !(!vector))

[1] TRUE

There are some neat ways to solve this with that in mind:

df%>%filter(reduce(across(where(is.numeric), as.logical), `|`))
#or simply
df%>%filter(reduce(across(where(is.numeric)), `|`))
#and with base R:
df[Reduce(`|`, df[sapply(df, is.numeric)]),]

And the cleanest so far, with the new if_any():

df%>%filter(if_any(where(is.numeric)))

Upvotes: 10

IRTFM
IRTFM

Reputation: 263481

This is kind of mixture of base logic and tidy syntax. It's admittedly a bit tortured to get around the alternate syntax that is rlang.

df1 <- df %>% filter(!(!rowSums(`[`(.,,3:7))))
> df1
  person     id a b c d e
1     Ed  plot1 2 0 4 0 7
2     Ed  plot3 0 6 0 0 5
3    Sue  plot4 0 4 0 3 0
4    Sue  plot6 1 8 0 1 1
5     Ed  plot7 0 1 0 0 0
6     Ed  plot9 4 0 0 9 0
7     Ed plot11 0 1 3 1 7
8    Sue plot12 0 1 8 5 0

Need to separate the two exclamation points with a paren, because !! is a different operation in "rlang" which is the logical environment for filter. (And note that rwoSums does have an na.rm argument if there are possibly NA's and those should be ignored.

And here's a base solution. Was !! working as a substitute for != 0

df1 <- df[ as.logical(rowSums(df[3:7]) ), ]

(I think it's probably better to stick with != 0)

Upvotes: 2

TarJae
TarJae

Reputation: 79276

We could calculate the row sums row wise and use slice

library(dplyr)
df %>%
  rowwise() %>% 
  slice(unique(c(which(sum(c_across(where(is.numeric))) != 0))))

Output:

  person id         a     b     c     d     e
  <chr>  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl>
1 Ed     plot1      2     0     4     0     7
2 Ed     plot3      0     6     0     0     5
3 Sue    plot4      0     4     0     3     0
4 Sue    plot6      1     8     0     1     1
5 Ed     plot7      0     1     0     0     0
6 Ed     plot9      4     0     0     9     0
7 Ed     plot11     0     1     3     1     7
8 Sue    plot12     0     1     8     5     0

Upvotes: 1

AnilGoyal
AnilGoyal

Reputation: 26238

Alternative without using rowSums

  • just used all inside rowwise filter converting desired vals into logical testing == 0 with cur_data()

library(dplyr)

df %>% rowwise() %>%
  filter(!all(cur_data()[-c(1:2)] == 0))

#> # A tibble: 8 x 7
#> # Rowwise: 
#>   person id         a     b     c     d     e
#>   <chr>  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Ed     plot1      2     0     4     0     7
#> 2 Ed     plot3      0     6     0     0     5
#> 3 Sue    plot4      0     4     0     3     0
#> 4 Sue    plot6      1     8     0     1     1
#> 5 Ed     plot7      0     1     0     0     0
#> 6 Ed     plot9      4     0     0     9     0
#> 7 Ed     plot11     0     1     3     1     7
#> 8 Sue    plot12     0     1     8     5     0

Created on 2021-05-30 by the reprex package (v2.0.0)

Upvotes: 3

Karthik S
Karthik S

Reputation: 11546

Does this work:

subset(df, rowSums(df[3:7]) != 0)
   person     id a b c d e
1      Ed  plot1 2 0 4 0 7
3      Ed  plot3 0 6 0 0 5
4     Sue  plot4 0 4 0 3 0
6     Sue  plot6 1 8 0 1 1
7      Ed  plot7 0 1 0 0 0
9      Ed  plot9 4 0 0 9 0
11     Ed plot11 0 1 3 1 7
12    Sue plot12 0 1 8 5 0

Upvotes: 2

Related Questions