Hydro
Hydro

Reputation: 1117

find yearly cumsum of a variable in R?

In my code below, i would like to find the cumsum for each year. Right now, Variable A is being summed for the entire duration. Any help would be appreciated.

library(dplyr)
library(lubridate)
set.seed(50)
DF <- data.frame(date = seq(as.Date("2001-01-01"), to= as.Date("2003-12-31"), by="day"),
                       A = runif(1095, 0,10))
DF1 <- DF %>% 
  mutate(Year = year(date), Month = month(date), JDay = yday(date)) %>%
  filter(between(Month,5,10)) %>%
  group_by(Year, JDay) %>% 
  mutate(Precipitation = cumsum(A))

Upvotes: 0

Views: 138

Answers (3)

user12728748
user12728748

Reputation: 8516

Here is a data.table solution for that. If you want the cumsum for each year, but show only the interval from month 5 to 10, this would be data.table code for that:

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:lubridate':
#> 
#>     hour, isoweek, mday, minute, month, quarter, second, wday, week,
#>     yday, year
set.seed(50)
DF <- data.frame(date = seq(as.Date("2001-01-01"), to= as.Date("2003-12-31"), by="day"),
                 A = runif(1095, 0,10))

data.table(DF)[, `:=` (Year = year(date), Month = month(date), JDay = yday(date))][, Precipitation := cumsum(A), by=Year][between(Month, 5, 10)][]
#>            date         A Year Month JDay Precipitation
#>   1: 2001-05-01 6.2465000 2001     5  121      568.9538
#>   2: 2001-05-02 0.1877191 2001     5  122      569.1416
#>   3: 2001-05-03 5.3717570 2001     5  123      574.5133
#>   4: 2001-05-04 5.5457454 2001     5  124      580.0591
#>   5: 2001-05-05 5.1508288 2001     5  125      585.2099
#>  ---                                                   
#> 548: 2003-10-27 0.1979292 2003    10  300     1479.8115
#> 549: 2003-10-28 6.7286553 2003    10  301     1486.5402
#> 550: 2003-10-29 8.7215420 2003    10  302     1495.2617
#> 551: 2003-10-30 8.2572257 2003    10  303     1503.5190
#> 552: 2003-10-31 9.6567923 2003    10  304     1513.1757

If you want the cumsum for the months 5-10 only, you would put the filter before calculating the cumsum:

data.table(DF)[, `:=` (Year = year(date), Month = month(date), JDay = yday(date))][between(Month, 5, 10)][, Precipitation := cumsum(A), by=Year][]
#>            date         A Year Month JDay Precipitation
#>   1: 2001-05-01 6.2465000 2001     5  121      6.246500
#>   2: 2001-05-02 0.1877191 2001     5  122      6.434219
#>   3: 2001-05-03 5.3717570 2001     5  123     11.805976
#>   4: 2001-05-04 5.5457454 2001     5  124     17.351722
#>   5: 2001-05-05 5.1508288 2001     5  125     22.502550
#>  ---                                                   
#> 548: 2003-10-27 0.1979292 2003    10  300    916.597973
#> 549: 2003-10-28 6.7286553 2003    10  301    923.326629
#> 550: 2003-10-29 8.7215420 2003    10  302    932.048171
#> 551: 2003-10-30 8.2572257 2003    10  303    940.305396
#> 552: 2003-10-31 9.6567923 2003    10  304    949.962189

Upvotes: 0

hendrikvanb
hendrikvanb

Reputation: 459

It seems the issue here is with your grouping clause. Specifically, as there are as many distinct combinations of Year and JDay in your data as there are rows in DF, the subsequent cumsum operation inside mutate will simply return the same value as the input column, A. I believe the following should give you what you're after

library(dplyr)
library(lubridate)

set.seed(50)
DF <- data.frame(date = seq(as.Date("2001-01-01"), to= as.Date("2003-12-31"), by="day"),
                 A = runif(1095, 0,10))
DF1 <- DF %>% 
  mutate(Year = year(date), Month = month(date), JDay = yday(date)) %>%
  filter(between(Month,5,10)) %>%
  arrange(Year, JDay) %>% 
  group_by(Year) %>% 
  mutate(Precipitation =  cumsum(A)) %>% 
  ungroup()

# illustrate that Precipitation does indeed give the cumulative value of A for
# each year by printing the first 5 observations for each year in DF1
DF1 %>% 
  group_by(Year) %>% 
  slice(1:5)
#> # A tibble: 15 x 6
#> # Groups:   Year [3]
#>    date           A  Year Month  JDay Precipitation
#>    <date>     <dbl> <dbl> <dbl> <dbl>         <dbl>
#>  1 2001-05-01 6.25   2001     5   121          6.25
#>  2 2001-05-02 0.188  2001     5   122          6.43
#>  3 2001-05-03 5.37   2001     5   123         11.8 
#>  4 2001-05-04 5.55   2001     5   124         17.4 
#>  5 2001-05-05 5.15   2001     5   125         22.5 
#>  6 2002-05-01 2.95   2002     5   121          2.95
#>  7 2002-05-02 6.75   2002     5   122          9.71
#>  8 2002-05-03 7.77   2002     5   123         17.5 
#>  9 2002-05-04 8.13   2002     5   124         25.6 
#> 10 2002-05-05 5.58   2002     5   125         31.2 
#> 11 2003-05-01 9.98   2003     5   121          9.98
#> 12 2003-05-02 8.24   2003     5   122         18.2 
#> 13 2003-05-03 6.13   2003     5   123         24.4 
#> 14 2003-05-04 5.22   2003     5   124         29.6 
#> 15 2003-05-05 9.81   2003     5   125         39.4

Upvotes: 0

DzimitryM
DzimitryM

Reputation: 581

Just remove JDay from grouping variables

DF1 <- DF %>% 
  mutate(Year = year(date), Month = month(date), JDay = yday(date)) %>%
  filter(between(Month,5,10)) %>%
  group_by(Year) %>% 
  mutate(Precipitation = cumsum(A)) %>%
  ungroup()

Upvotes: 2

Related Questions