Reputation: 1803
I am trying to perform a conditional summation using R
and dplyr
using implicit column names. So starting with
> df <- data.frame(colA=c(1,2,1,1),
+ colB=c(0,0,3,1),
+ colC=c(0,1,2,3),
+ colD=c(2,2,2,2))
> df
colA colB colC colD
1 1 0 0 2
2 2 0 1 2
3 1 3 2 2
4 1 1 3 2
I am trying to apply the psuedocode:
foreach column c
if(row.val > 1)
calc += (row.val - 1)
I can accomplish this in a fairly straightforward manner using some simple base R
subsetting:
> df.ans <- data.frame(calcA = sum(df$colA[df$colA > 1] - 1),
+ calcB = sum(df$colB[df$colB > 1] - 1),
+ calcC = sum(df$colC[df$colC > 1] - 1),
+ calcD = sum(df$colD[df$colD > 1] - 1))
> df.ans
calcA calcB calcC calcD
1 1 2 3 4
However I would like a solution that does not have to explicitly state the column names (colA
, colB
, etc.) because there are many and they may change in the future. If I were doing a simple sum
the calculation would be possible with dplyr
and:
df %>%
summarise_all(funs(sum))
Things I have tried:
filter_at
components of dplyr
but found that insufficient for this purpose because they take entire rows, whereas I am filtering rows per column independently.summarise
function. This is probably the closest I have gotten but the evaluations always resolve to booleans which throw off the summation. For example summarise_all(funs(sum(. > 1)))
Upvotes: 1
Views: 815
Reputation: 214927
You can translate the hard coding example to summarize_all
pretty easily as, i.e, replace df$col..
with .
:
df %>% summarise_all(~ sum(.[. > 1] - 1))
# colA colB colC colD
#1 1 2 3 4
Or with the funs
syntax:
df %>% summarise_all(funs(sum(.[. > 1] - 1)))
# colA colB colC colD
#1 1 2 3 4
Upvotes: 2
Reputation: 79208
You can also use sapply
from base R:
sapply(df,function(x)sum(x[x>1]-1))
colA colB colC colD
1 2 3 4
Upvotes: 1