RegressForward
RegressForward

Reputation: 290

Sum consecutive day values

In keeping with a previous question, imagine I have a data set:

Date       rain code
2009-04-01  0.0 0 
2009-04-02  0.0 0 
2009-04-03  0.0 0 
2009-04-04  0.7 1 
2009-04-05 54.2 1  
2009-04-06  0.0 0 
2009-04-07  5.0 1 
2009-04-08  9.0 0 
2009-04-09  0.0 0 
2009-04-10  0.0 0 
2009-04-11  0.0 0 
2009-04-12  5.3 1  
2009-04-13 10.1 1  
2009-04-14  6.0 1  
2009-04-15  8.7 1  
2009-04-16  0.0 0 
2009-04-17  0.0 0 
2009-04-18  0.0 0 
2009-04-19  2.0 0 
2009-04-20  3.0 0 
2009-04-21  0.0 0 
2009-04-22  0.0 0 
2009-04-23  0.0 0 
2009-04-24  0.0 0 
2009-04-25  4.3 1  
2009-04-26 42.2 1  
2009-04-27 45.6 1  
2009-04-28 12.6 1  
2009-04-29  6.2 1  
2009-04-30  1.0 1  

DT = structure(list(Date = structure(c(14335, 14336, 14337, 14338, 
14339, 14340, 14341, 14342, 14343, 14344, 14345, 14346, 14347, 
14348, 14349, 14350, 14351, 14352, 14353, 14354, 14355, 14356, 
14357, 14358, 14359, 14360, 14361, 14362, 14363, 14364), class = "Date"), 
    rain = c(0, 0, 0, 0.7, 54.2, 0, 5, 9, 0, 0, 0, 5.3, 10.1, 
    6, 8.7, 0, 0, 0, 2, 3, 0, 0, 0, 0, 4.3, 42.2, 45.6, 12.6, 
    6.2, 1), code = c(0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 
    0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
    1L, 1L, 1L, 1L, 1L)), .Names = c("Date", "rain", "code"), row.names = c(NA, 
-30L), class = "data.frame")

I am trying to collapse the data set to get the sum of consecutive values of rain when code is 1. I need to have sum of them until the day after the event, inclusive. For example, I want to get sum of rain values from 2009-04-13 to 2009-04-06, and 2009-04-07 to 2009-04-08 separately. So I am trying to find way to define when the code is equal to 1 and the following day inclusive. The final product ought to look like:

Date       rain code
2009-04-01  0.0 0 
2009-04-02  0.0 0 
2009-04-03  0.0 0  
2009-04-06  54.9 1  
2009-04-08  14.0 1 
2009-04-09  0.0 0 
2009-04-10  0.0 0 
2009-04-11  0.0 0 
2009-04-16 30.1 1 
2009-04-17  0.0 0 
2009-04-18  0.0 0 
2009-04-19  2.0 0 
2009-04-20  3.0 0 
2009-04-21  0.0 0 
2009-04-22  0.0 0 
2009-04-23  0.0 0 
2009-04-24  0.0 0  
2009-04-30  111.9 1  (if last entry of data frame)

Any help on the above problem would be greatly appreciated.

Upvotes: 2

Views: 276

Answers (2)

shayaa
shayaa

Reputation: 2797

If you want to use base R you can always use diff to calculate when the rain starts and stops.

start= which(diff(df$code)==1) +1
end = c(which(diff(df$code)==-1)+1, nrow(df))
l <- mapply(":", start, end)

Getting the data to collapse is just a matter of throwing out all other non stop day indices and replacing the last day with the following, which gets the cumulative rain amount on the day that it stops raining.

lapply(l, function(x) {
  df[x,][length(x),"rain"] <- sum(df[x,"rain"])
  df[x,][length(x),]
})

[[1]]
        Date rain code
6 2009-04-06 54.9    0
[[2]]
        Date rain code
8 2009-04-08   14    0
[[3]]
         Date rain code
16 2009-04-16 30.1    0
[[4]]
         Date  rain code
30 2009-04-30 111.9    1

Upvotes: 0

Frank
Frank

Reputation: 66819

Here's one way:

library(data.table)
setDT(DT)

res = DT[, .(
  Date = Date[.N], 
  rain = sum(rain),
  code = code[1L]
), by=.(g = cumsum(shift(!code, fill=FALSE)))]

res[, g := NULL]

          Date  rain code
 1: 2009-04-01   0.0    0
 2: 2009-04-02   0.0    0
 3: 2009-04-03   0.0    0
 4: 2009-04-06  54.9    1
 5: 2009-04-08  14.0    1
 6: 2009-04-09   0.0    0
 7: 2009-04-10   0.0    0
 8: 2009-04-11   0.0    0
 9: 2009-04-16  30.1    1
10: 2009-04-17   0.0    0
11: 2009-04-18   0.0    0
12: 2009-04-19   2.0    0
13: 2009-04-20   3.0    0
14: 2009-04-21   0.0    0
15: 2009-04-22   0.0    0
16: 2009-04-23   0.0    0
17: 2009-04-24   0.0    0
18: 2009-04-30 111.9    1

How it works:

  • shift is taking the value from the prior row
  • When a logical value like !code is added up, TRUE/FALSE are treated as 1/0
  • .N is the last row in the by= group

The general syntax is DT[, j, by] where j is computed using each by subset of data.

Upvotes: 3

Related Questions