Volker Holzendorf
Volker Holzendorf

Reputation: 247

Count days until sum of cases is minimum 5 - R dyplyr solution

I have a dataset with days and cases per day (eg. illness, produced srews, ...)

Now I want to calculate, how long it lasts until a predefined amount is reported/ produced.

Here a simple example: I have ten days, per day in 'cases' is given how many cases were reported on this day.

Aim is to get a colum, how long it lasts for each day until (minimum) 5 cases are reported.

My data:

dates<-c("10.03.2020","11.03.2020","12.03.2020","13.03.2020","14.03.2020",
         "15.03.2020","16.03.2020","17.03.2020","18.03.2020","19.03.2020")
cases<-c(0,1,2,2,6,2,2,1,5,1)
dftest<-data.frame(dates,cases)

        dates cases
1  10.03.2020     0
2  11.03.2020     1
3  12.03.2020     2
4  13.03.2020     2
5  14.03.2020     6
6  15.03.2020     2
7  16.03.2020     2
8  17.03.2020     1
9  18.03.2020     5
10 19.03.2020     1

And my expected Result. (Column 'days5cases' can also be numeric only):

        dates cases days5cases
1  10.03.2020     0      4days
2  11.03.2020     1      3days
3  12.03.2020     2      3days
4  13.03.2020     2      2days
5  14.03.2020     6       1day
6  15.03.2020     2      3days
7  16.03.2020     2      3days
8  17.03.2020     1      2days
9  18.03.2020     5       1day
10 19.03.2020     1       <NA>

EDIT:

Some examples for calculating 'days5cases'

My Question:

How can I sum up column 'cases' until 5 cases are reached? Start for calculating is each day in dates.

Thank you for any idea to solve this problem ....

I tried from package runner sum_run, but here it is only possible to get the sum for the next 5 days.

Simmilary is rollapply from package zoo.

Stay healthy!

Upvotes: 1

Views: 132

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388907

Assuming you have a row for each day using sapply we could iterate over each row, calculate the cumulative sum from that row till last row and return the index number where it first touches total cumulative sum to 5.

sapply(seq_len(nrow(dftest)), function(i) 
       which(cumsum(dftest$cases[i:nrow(dftest)]) >= 5)[1])

#[1]  4  3  3  2  1  3  3  2  1 NA

To do this in tidyverse we can do :

library(dplyr)
library(purrr)
dftest %>%
   mutate(days5cases = map_dbl(row_number(), 
                           ~which(cumsum(cases[.x:n()]) >= 5)[1]))


#        dates cases days5cases
#1  10.03.2020     0          4
#2  11.03.2020     1          3
#3  12.03.2020     2          3
#4  13.03.2020     2          2
#5  14.03.2020     6          1
#6  15.03.2020     2          3
#7  16.03.2020     2          3
#8  17.03.2020     1          2
#9  18.03.2020     5          1
#10 19.03.2020     1         NA

Upvotes: 4

Related Questions