Reputation: 1117
In my code below, i would like to find the cumsum
for each year. Right now, Variable A
is being summed
for the entire duration. Any help would be appreciated.
library(dplyr)
library(lubridate)
set.seed(50)
DF <- data.frame(date = seq(as.Date("2001-01-01"), to= as.Date("2003-12-31"), by="day"),
A = runif(1095, 0,10))
DF1 <- DF %>%
mutate(Year = year(date), Month = month(date), JDay = yday(date)) %>%
filter(between(Month,5,10)) %>%
group_by(Year, JDay) %>%
mutate(Precipitation = cumsum(A))
Upvotes: 0
Views: 138
Reputation: 8516
Here is a data.table
solution for that. If you want the cumsum for each year, but show only the interval from month 5 to 10, this would be data.table
code for that:
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:lubridate':
#>
#> hour, isoweek, mday, minute, month, quarter, second, wday, week,
#> yday, year
set.seed(50)
DF <- data.frame(date = seq(as.Date("2001-01-01"), to= as.Date("2003-12-31"), by="day"),
A = runif(1095, 0,10))
data.table(DF)[, `:=` (Year = year(date), Month = month(date), JDay = yday(date))][, Precipitation := cumsum(A), by=Year][between(Month, 5, 10)][]
#> date A Year Month JDay Precipitation
#> 1: 2001-05-01 6.2465000 2001 5 121 568.9538
#> 2: 2001-05-02 0.1877191 2001 5 122 569.1416
#> 3: 2001-05-03 5.3717570 2001 5 123 574.5133
#> 4: 2001-05-04 5.5457454 2001 5 124 580.0591
#> 5: 2001-05-05 5.1508288 2001 5 125 585.2099
#> ---
#> 548: 2003-10-27 0.1979292 2003 10 300 1479.8115
#> 549: 2003-10-28 6.7286553 2003 10 301 1486.5402
#> 550: 2003-10-29 8.7215420 2003 10 302 1495.2617
#> 551: 2003-10-30 8.2572257 2003 10 303 1503.5190
#> 552: 2003-10-31 9.6567923 2003 10 304 1513.1757
If you want the cumsum for the months 5-10 only, you would put the filter before calculating the cumsum:
data.table(DF)[, `:=` (Year = year(date), Month = month(date), JDay = yday(date))][between(Month, 5, 10)][, Precipitation := cumsum(A), by=Year][]
#> date A Year Month JDay Precipitation
#> 1: 2001-05-01 6.2465000 2001 5 121 6.246500
#> 2: 2001-05-02 0.1877191 2001 5 122 6.434219
#> 3: 2001-05-03 5.3717570 2001 5 123 11.805976
#> 4: 2001-05-04 5.5457454 2001 5 124 17.351722
#> 5: 2001-05-05 5.1508288 2001 5 125 22.502550
#> ---
#> 548: 2003-10-27 0.1979292 2003 10 300 916.597973
#> 549: 2003-10-28 6.7286553 2003 10 301 923.326629
#> 550: 2003-10-29 8.7215420 2003 10 302 932.048171
#> 551: 2003-10-30 8.2572257 2003 10 303 940.305396
#> 552: 2003-10-31 9.6567923 2003 10 304 949.962189
Upvotes: 0
Reputation: 459
It seems the issue here is with your grouping clause. Specifically, as there are as many distinct combinations of Year
and JDay
in your data as there are rows in DF
, the subsequent cumsum
operation inside mutate
will simply return the same value as the input column, A
. I believe the following should give you what you're after
library(dplyr)
library(lubridate)
set.seed(50)
DF <- data.frame(date = seq(as.Date("2001-01-01"), to= as.Date("2003-12-31"), by="day"),
A = runif(1095, 0,10))
DF1 <- DF %>%
mutate(Year = year(date), Month = month(date), JDay = yday(date)) %>%
filter(between(Month,5,10)) %>%
arrange(Year, JDay) %>%
group_by(Year) %>%
mutate(Precipitation = cumsum(A)) %>%
ungroup()
# illustrate that Precipitation does indeed give the cumulative value of A for
# each year by printing the first 5 observations for each year in DF1
DF1 %>%
group_by(Year) %>%
slice(1:5)
#> # A tibble: 15 x 6
#> # Groups: Year [3]
#> date A Year Month JDay Precipitation
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2001-05-01 6.25 2001 5 121 6.25
#> 2 2001-05-02 0.188 2001 5 122 6.43
#> 3 2001-05-03 5.37 2001 5 123 11.8
#> 4 2001-05-04 5.55 2001 5 124 17.4
#> 5 2001-05-05 5.15 2001 5 125 22.5
#> 6 2002-05-01 2.95 2002 5 121 2.95
#> 7 2002-05-02 6.75 2002 5 122 9.71
#> 8 2002-05-03 7.77 2002 5 123 17.5
#> 9 2002-05-04 8.13 2002 5 124 25.6
#> 10 2002-05-05 5.58 2002 5 125 31.2
#> 11 2003-05-01 9.98 2003 5 121 9.98
#> 12 2003-05-02 8.24 2003 5 122 18.2
#> 13 2003-05-03 6.13 2003 5 123 24.4
#> 14 2003-05-04 5.22 2003 5 124 29.6
#> 15 2003-05-05 9.81 2003 5 125 39.4
Upvotes: 0
Reputation: 581
Just remove JDay
from grouping variables
DF1 <- DF %>%
mutate(Year = year(date), Month = month(date), JDay = yday(date)) %>%
filter(between(Month,5,10)) %>%
group_by(Year) %>%
mutate(Precipitation = cumsum(A)) %>%
ungroup()
Upvotes: 2