Juanchi
Juanchi

Reputation: 1166

Group data by group of days within months in R

I am trying to summarise this daily time serie of rainfall by groups of 10-day periods within each month and calculate the acummulated rainfall.

library(tidyverse)
(dat <- tibble(
  date = seq(as.Date("2016-01-01"), as.Date("2016-12-31"), by=1),
  rainfall = rgamma(length(date), shape=2, scale=2)))

Therefore, I will obtain variability in the third group along the year, for instance: in january the third period has 11 days, february 9 days, and so on. This is my try:

library(lubridate)
dat %>% 
  group_by(decade=floor_date(date, "10 days")) %>%
  summarize(acum_rainfall=sum(rainfall), 
            days = n())

this is the resulting output

# A tibble: 43 x 3
   decade     acum_rainfall  days
   <date>             <dbl> <int>
 1 2016-01-01         48.5     10
 2 2016-01-11         39.9     10
 3 2016-01-21         36.1     10
 4 2016-01-31          1.87     1
 5 2016-02-01         50.6     10
 6 2016-02-11         32.1     10
 7 2016-02-21         22.1      9
 8 2016-03-01         45.9     10
 9 2016-03-11         30.0     10
10 2016-03-21         42.4     10
# ... with 33 more rows

can someone help me to sum the residuals periods to the third one to obtain always 3 periods within each month? This would be the desired output (pay attention to the row 3):

   decade     acum_rainfall  days
   <date>             <dbl> <int>
 1 2016-01-01         48.5     10
 2 2016-01-11         39.9     10
 3 2016-01-21         37.97    11
 4 2016-02-01         50.6     10
 5 2016-02-11         32.1     10
 6 2016-02-21         22.1      9

Upvotes: 0

Views: 140

Answers (1)

divibisan
divibisan

Reputation: 12155

One way to do this is to use if_else to apply floor_date with different arguments depending on the day value of date. If day(date) is <30, use the normal way, if it's >= 30, then use '20 days' to ensure it gets rounded to day 21:

dat %>% 
    group_by(decade=if_else(day(date) >= 30,
                            floor_date(date, "20 days"),
                            floor_date(date, "10 days"))) %>%
    summarize(acum_rainfall=sum(rainfall), 
              days = n())

# A tibble: 36 x 3
   decade     acum_rainfall  days
   <date>             <dbl> <int>
 1 2016-01-01          38.8    10
 2 2016-01-11          38.4    10
 3 2016-01-21          43.4    11
 4 2016-02-01          34.4    10
 5 2016-02-11          34.8    10
 6 2016-02-21          25.3     9
 7 2016-03-01          39.6    10
 8 2016-03-11          53.9    10
 9 2016-03-21          38.1    11
10 2016-04-01          36.6    10
# … with 26 more rows

Upvotes: 2

Related Questions