latlio
latlio

Reputation: 1587

How to `summarize()` by another incrementing variable?

Data

test <- tibble(index = c(1,1,2,2,3,3),
               R = c(1,0,1,1,1,1),
               X = c(0,1,1,0,1,1))
# A tibble: 6 x 3
  index     R     X
  <dbl> <dbl> <dbl>
1     1     1     0
2     1     0     1
3     2     1     1
4     2     1     0
5     3     1     1
6     3     1     1

The summarize() function should follow this logic:

summarize(power = sum(R & X)/sum(X))

The desired output is:

# A tibble: 3 x 2
  index power
  <dbl> <dbl>
1     1  0   
2     2  0.5 
3     3  0.75

where you will have calculated power for index 1, then indices 1 and 2, then indices 1,2, and 3. This function should extend to a large number of indices. Thanks!

Upvotes: 0

Views: 41

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388797

You can first calculate sum for each index and then take ratio of cumsum values.

library(dplyr)

test %>%
  group_by(index) %>%
  summarise(val1 = sum(R & X), 
            val2 = sum(X)) %>%
  transmute(index, power  = cumsum(val1)/cumsum(val2))

#  index power
#  <dbl> <dbl>
#1     1  0   
#2     2  0.5 
#3     3  0.75

Upvotes: 1

Related Questions