JHowIX
JHowIX

Reputation: 1803

Summarize with conditions and implicit column names in dplyr

I am trying to perform a conditional summation using R and dplyr using implicit column names. So starting with

> df <- data.frame(colA=c(1,2,1,1),
+                  colB=c(0,0,3,1),
+                  colC=c(0,1,2,3),
+                  colD=c(2,2,2,2))
> df
  colA colB colC colD
1    1    0    0    2
2    2    0    1    2
3    1    3    2    2
4    1    1    3    2

I am trying to apply the psuedocode:

foreach column c
    if(row.val > 1)
        calc += (row.val - 1)

I can accomplish this in a fairly straightforward manner using some simple base R subsetting:

> df.ans <- data.frame(calcA = sum(df$colA[df$colA > 1] - 1),
+                      calcB = sum(df$colB[df$colB > 1] - 1),
+                      calcC = sum(df$colC[df$colC > 1] - 1),
+                      calcD = sum(df$colD[df$colD > 1] - 1))
> df.ans
  calcA calcB calcC calcD
1     1     2     3     4

However I would like a solution that does not have to explicitly state the column names (colA, colB, etc.) because there are many and they may change in the future. If I were doing a simple sum the calculation would be possible with dplyr and:

df %>% 
summarise_all(funs(sum))

Things I have tried:

Upvotes: 1

Views: 815

Answers (2)

akuiper
akuiper

Reputation: 214927

You can translate the hard coding example to summarize_all pretty easily as, i.e, replace df$col.. with .:

df %>% summarise_all(~ sum(.[. > 1] - 1))

#  colA colB colC colD
#1    1    2    3    4

Or with the funs syntax:

df %>% summarise_all(funs(sum(.[. > 1] - 1)))

#  colA colB colC colD
#1    1    2    3    4

Upvotes: 2

Onyambu
Onyambu

Reputation: 79208

You can also use sapply from base R:

sapply(df,function(x)sum(x[x>1]-1))
colA colB colC colD 
   1    2    3    4 

Upvotes: 1

Related Questions