Summarize with conditions and implicit column names in dplyr

Question

I am trying to perform a conditional summation using R and dplyr using implicit column names. So starting with

> df <- data.frame(colA=c(1,2,1,1),
+                  colB=c(0,0,3,1),
+                  colC=c(0,1,2,3),
+                  colD=c(2,2,2,2))
> df
  colA colB colC colD
1    1    0    0    2
2    2    0    1    2
3    1    3    2    2
4    1    1    3    2

I am trying to apply the psuedocode:

foreach column c
    if(row.val > 1)
        calc += (row.val - 1)

I can accomplish this in a fairly straightforward manner using some simple base R subsetting:

> df.ans <- data.frame(calcA = sum(df$colA[df$colA > 1] - 1),
+                      calcB = sum(df$colB[df$colB > 1] - 1),
+                      calcC = sum(df$colC[df$colC > 1] - 1),
+                      calcD = sum(df$colD[df$colD > 1] - 1))
> df.ans
  calcA calcB calcC calcD
1     1     2     3     4

However I would like a solution that does not have to explicitly state the column names (colA, colB, etc.) because there are many and they may change in the future. If I were doing a simple sum the calculation would be possible with dplyr and:

df %>% 
summarise_all(funs(sum))

Things I have tried:

The filter_at components of dplyr but found that insufficient for this purpose because they take entire rows, whereas I am filtering rows per column independently.
This answer but found it insufficient because it uses explicit column names.
Conditionals inside a custom summarise function. This is probably the closest I have gotten but the evaluations always resolve to booleans which throw off the summation. For example summarise_all(funs(sum(. > 1)))

akuiper · Accepted Answer

You can translate the hard coding example to summarize_all pretty easily as, i.e, replace df$col.. with .:

df %>% summarise_all(~ sum(.[. > 1] - 1))

#  colA colB colC colD
#1    1    2    3    4

Or with the funs syntax:

df %>% summarise_all(funs(sum(.[. > 1] - 1)))

#  colA colB colC colD
#1    1    2    3    4

Summarize with conditions and implicit column names in dplyr

Answers (2)

Related Questions