Sum across rows but only the cells that meet a condition

Question

Sample data:

df <- tibble(x = c(0.1, 0.2, 0.3, 0.4),
             y = c(0.1, 0.1, 0.2, 0.3),
             z = c(0.1, 0.2, 0.2, 0.2))
df
# A tibble: 4 x 3
      x     y     z
    
1   0.1   0.1   0.1
2   0.2   0.1   0.2
3   0.3   0.2   0.2
4   0.4   0.3   0.2

I want to sum across rows and I want to only add up the "cells" that meet a certain logical condition. In this example, I want to add up, rowwise, only cells that contain a equal to or greater than a specified threshold.

Desired Output

threshold <- 0.15
# A tibble: 4 x 4
      x     y     z cond_sum
        
1   0.1   0.1   0.1      0  
2   0.2   0.1   0.2      0.4
3   0.3   0.2   0.2      0.7
4   0.4   0.3   0.2      0.9

Pseudo-code

This is the wrangling idea I have in mind.

df %>%
  rowwise() %>%
  mutate(cond_sum = sum(c_across(where(~ "cell" >= threshold))))

Tidy solutions appreciated!

akrun · Accepted Answer

An efficient option is replace the values that are below the threshold to NA and make use of na.rm in rowSums instead of rowwise/c_across

library(dplyr)
df %>% 
  mutate(cond_sum = rowSums(replace(., . < threshold, NA), na.rm = TRUE))

-output

# A tibble: 4 x 4
#      x     y     z cond_sum
#        
#1   0.1   0.1   0.1      0  
#2   0.2   0.1   0.2      0.4
#3   0.3   0.2   0.2      0.7
#4   0.4   0.3   0.2      0.9

Or with c_across

df %>% 
  rowwise %>%
  mutate(cond_sum = {val <- c_across(everything())
                     sum(val[val >= threshold])}) %>%
  ungroup

Or base R

df$cond_sum <- rowSums(replace(df, df < threshold, NA), na.rm = TRUE)

Sum across rows but only the cells that meet a condition

Answers (2)

Related Questions