Omry Atia
Omry Atia

Reputation: 2443

conditional summation in dplyr

I have the following data frame:

df <- structure(list(Claim2015 = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Claim2016 = c(1, 0, 1,
1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0), Claim2017 = c(0,
0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1), Claim2018 = c(0,
1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1), Claim2019 = c(0,
0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))

I would like to create a conditional summation, that says out of those who had a claim in 2015, how many had in 2016; out of those with both, how many had in 2017 and so on (the sum will decrease).

So the output expected is:

 db <- data_frame(Had2015 = 20, Had2016 = 15, Had2017 = 9, Had2018 = 7, Had2019 = 5)

What I have started doing is

df1 <- df %>% group_by_all %>% count

And this organizes the output in a way that makes it easier to count - so I am summing the n for those with 1, those with 1 & 1, those with 1 & 1 & 1 and so on - just don't know how to do this automatically.

Any help would be appreciated.

Upvotes: 3

Views: 94

Answers (2)

Sotos
Sotos

Reputation: 51582

Another similar idea in base R, which exposes the accumulate argument of Reduce can be,

sapply(Reduce(`data.frame`, split.default(df, seq_along(df)), accumulate = TRUE), function(i)
                                                                  sum(rowSums(i) == ncol(i)))

#[1] 20 15  9  7  5

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 388797

Using base R, we can incrementally loop over each column and count number of rows with all 1.

sapply(seq_along(df), function(x) sum(rowSums(df[1:x] == 1) == x))
#[1] 20 15  9  7  5

Upvotes: 5

Related Questions