Sepa
Sepa

Reputation: 111

Mutate each day and each hour using tidyverse functions in R

Currently I have code returns each a tibble of events that occur each day using the following:

online_toy_purchases %>%
mutate(interval = lubridate::date(date)) %>%
group_by(interval) %>%
summarise(count = n())

This currently returns the following:

# A tibble: 31 x 2
interval    count
2018-12-01    500
2018-12-02    300
2018-12-03    400
2018-12-04    200
2018-12-05    600
...
2018-12-31    100

I would like my code to group by each hour and each day for a more granular view of the data, which would return the following:

# A tibble: 744  x 2
interval             count
2018-12-01 01:00:00    50    
2018-12-01 02:00:00    60  
2018-12-01 03:00:00    20  
2018-12-01 04:00:00    80  
...
2018-12-31 24:00:00    10 

online_toy_purchases is a tibble that contains, among other features, the ID of the transaction and a timestamp containing the date and the hour, minute and second of the purchase (i.e -> "2018-12-01 01:20:58")

Upvotes: 0

Views: 635

Answers (1)

Jon Spring
Jon Spring

Reputation: 66775

This will count the number of rows within each hour of the data.

library(tidyverse)
online_toy_purchases %>%
  # assuming that "date" is formatted as a datetime variable already
  count(time = lubridate::floor_date(date, "1 hour")) %>%

  # additional step using padr::pad to add missing hours and
  #   tidyr::replace_na to make NAs into zeroes
  padr::pad() %>%
  replace_na(list(n=0))

For visualization and further analysis, it will be helpful to have rows recording periods with no data. You might alternatively accomplish something similar by converting to a tsibble.

Upvotes: 1

Related Questions