Reputation: 111
Currently I have code returns each a tibble of events that occur each day using the following:
online_toy_purchases %>%
mutate(interval = lubridate::date(date)) %>%
group_by(interval) %>%
summarise(count = n())
This currently returns the following:
# A tibble: 31 x 2
interval count
2018-12-01 500
2018-12-02 300
2018-12-03 400
2018-12-04 200
2018-12-05 600
...
2018-12-31 100
I would like my code to group by each hour and each day for a more granular view of the data, which would return the following:
# A tibble: 744 x 2
interval count
2018-12-01 01:00:00 50
2018-12-01 02:00:00 60
2018-12-01 03:00:00 20
2018-12-01 04:00:00 80
...
2018-12-31 24:00:00 10
online_toy_purchases is a tibble that contains, among other features, the ID of the transaction and a timestamp containing the date and the hour, minute and second of the purchase (i.e -> "2018-12-01 01:20:58")
Upvotes: 0
Views: 635
Reputation: 66775
This will count the number of rows within each hour of the data.
library(tidyverse)
online_toy_purchases %>%
# assuming that "date" is formatted as a datetime variable already
count(time = lubridate::floor_date(date, "1 hour")) %>%
# additional step using padr::pad to add missing hours and
# tidyr::replace_na to make NAs into zeroes
padr::pad() %>%
replace_na(list(n=0))
For visualization and further analysis, it will be helpful to have rows recording periods with no data. You might alternatively accomplish something similar by converting to a tsibble
.
Upvotes: 1