How to aggregate the hourly rainfall data data into 24hr series

Question

My data frame is as follows

df <- tibble::tribble(
              ~date, ~pcp,
   "9/27/2017 9:00",    0,
  "9/27/2017 10:00",    0,
  "9/27/2017 11:00",    0,
  "9/27/2017 12:00",    0,
  "9/27/2017 13:00",    0,
  "9/27/2017 14:00",    0,
  "9/27/2017 15:00",    0,
  "9/27/2017 16:00",    0,
  "9/27/2017 17:00",    0,
  "9/27/2017 18:00",    0,
  "9/27/2017 19:00",    0,
  "9/27/2017 20:00",    0,
  "9/27/2017 21:00",    0,
  "9/27/2017 22:00",    0,
  "9/27/2017 23:00",    0,
   "9/28/2017 0:00",    0,
   "9/28/2017 1:00",    0,
   "9/28/2017 2:00",    0,
   "9/28/2017 3:00",    0,
   "9/28/2017 4:00",    0,
   "9/28/2017 5:00",    0,
   "9/28/2017 6:00",    0,
   "9/28/2017 7:00", 0.15,
   "9/28/2017 8:00", 8.76,
   "9/28/2017 9:00", 0.02,
  "9/28/2017 10:00",    0,
  "9/28/2017 11:00",    0,
  "9/28/2017 12:00",    0,
  "9/28/2017 13:00",    0,
  "9/28/2017 14:00",    0,
  "9/28/2017 15:00",    0,
  "9/28/2017 16:00",    0,
  "9/28/2017 17:00",    0,
  "9/28/2017 18:00",    0,
  "9/28/2017 19:00",    0,
  "9/28/2017 20:00",    0,
  "9/28/2017 21:00",    0,
  "9/28/2017 22:00",    0,
  "9/28/2017 23:00",    0,
   "9/29/2017 0:00",    0,
   "9/29/2017 1:00",    0,
   "9/29/2017 2:00",    0,
   "9/29/2017 3:00",    0,
   "9/29/2017 4:00",    0,
   "9/29/2017 5:00",    0,
   "9/29/2017 6:00",    0,
   "9/29/2017 7:00",    0,
   "9/29/2017 8:00", 0.31
  )

I would like to have a daily aggregate of the data (sum). Instead of aggregating from 00:00 to 23:59 of the same day, I would like it with the initial time starting at 09:00 day i and ending at 08:59 of day i + 1 (24 h later).

The output is desire is like the following

9/28/2017,8.91
9/29/2017,0.33

I did it manually in Excel, I'm not sure what code to use for this problem. The provided example is an extract of the long data frame. Thanks...

Matias Andina · Accepted Answer

If you want the data for each date

library(tidyverse)
library(lubridate)
df %>% 
    mutate(datetime = parse_date_time(date, "mdy H:M"),
           date = date(datetime)) %>%
    group_by(date) %>%
    summarise(sum_pcp = sum(pcp))

Will produce

# A tibble: 3 x 2
  date       sum_pcp
            
1 2017-09-27       0   
2 2017-09-28       8.93
3 2017-09-29       0.31

If you want to count from 9:00 to 9:00 of the following day, you could introduce a subjective_day by subtracting 9 hours from the original datetime object.

df %>% 
  mutate(datetime = parse_date_time(date, "mdy H:M"),
         date = date(datetime),
         initial_day = date(first(date)),
         time = hour(datetime),
         subjective_day = datetime - hours(9)) %>%
  group_by(subjective_day = floor_date(subjective_day, "1 day")) %>% 
  summarise(sum_pcp = sum(pcp))

Will produce


  subjective_day      sum_pcp
                  
1 2017-09-27 00:00:00    8.91
2 2017-09-28 00:00:00    0.33

Your subjective day will always be 1 day behind, so you can just adjust for that or have in mind that somewhere with a 9 hour time difference in the world would actually have that as the correct datetime :)

How to aggregate the hourly rainfall data data into 24hr series

Answers (2)

Related Questions