Aggregating hourly count in dplyr

Question

I have following dataframe in R

Date                      ID      
01-01-2017 12:39:00       CDF
01-01-2017 01:39:00       WED
01-01-2017 02:39:00       QWE
01-01-2017 05:39:00       TYU
01-01-2017 17:39:00       ERT
02-01-2017 02:30:34       DEF

I want to calculate hourly count of ID. My desired dataframe would be

Date           hours               Count
01-01-2017     00:00 - 01:00       1
01-01-2017     01:00 - 02:00       1
01-01-2017     02:00 - 03:00       1
01-01-2017     03:00 - 04:00       0
01-01-2017     04:00 - 05:00       0
01-01-2017     05:00 - 06:00       1
.
01-01-2017     23:00 - 00:00       0 
.
02-01-2017     12:00 - 01:00       0 
02-01-2017     01:00 - 02:00       0
02-01-2017     02:00 - 03:00       1

Where there is no id exists,I want hourly bucket to be zero. Every date will contain 24 hours movement.

How can I achieve this in R?

konvas · Accepted Answer

tidyverse provides some functions which are useful here, like count/tally and complete

library(tidyverse)
library(lubridate)

dat <- read_csv('Date, ID      
  01-01-2017 12:39:00, CDF
  01-01-2017 01:39:00, WED
  01-01-2017 02:39:00, QWE
  01-01-2017 05:39:00, TYU
  01-01-2017 17:39:00, ERT
  02-01-2017 02:30:34, DEF'
) 

dat %>% 
   mutate(
       Date = dmy_hms(Date),
       day = floor_date(Date, 'day'), 
       hour = hour(Date)
   ) %>%
   group_by(day, hour) %>%
   tally %>%
   complete(day, hour = 0:23, fill = list('n' = 0))


## A tibble: 48 x 3
## Groups:   day [2]
#          day  hour     n
#         
# 1 2017-01-01     0     0
# 2 2017-01-01     1     1
# 3 2017-01-01     2     1
# 4 2017-01-01     3     0
# 5 2017-01-01     4     0
# 6 2017-01-01     5     1
# 7 2017-01-01     6     0
# 8 2017-01-01     7     0
# 9 2017-01-01     8     0
#10 2017-01-01     9     0
## ... with 38 more rows

Aggregating hourly count in dplyr

Answers (2)

Related Questions