J.math
J.math

Reputation: 3

Grouping data over time interval

I have a large dataset of sales information from multiple stores over a period of a few weeks. I need to calculate revenues and average prices over minute intervals, and I can't figure out a smart way to do this. So for example for the data below, I'd want to calculate the revenues and average prices over 10-minute periods, for example the 10-minute period on 2019-02-11 from 09:10:00 to 09:20:00 would result in 2 * 14 + 5 * 9. I've considered labeling each interval with a number and adding a column with the labels, but I don't really know how to implement this. Another option I thought of was to create a separate dataframe with the intervals, and then somehow mapping information from the original data to the interval dataframe, but I didn't get far with this either. Any help on this would be much appreciated!

Example data:

Time Quantity Price
2019-02-11 09:15:23 2 14
2019-02-11 09:18:01 5 9
2019-02-11 10:15:23 1 12
2019-02-11 09:28:01 5 9

Upvotes: 0

Views: 36

Answers (1)

crestor
crestor

Reputation: 1466

library(tidyverse)
library(lubridate)


df <- read.table(textConnection("time;quantity;unit_price
2019-02-11 09:15:23;2;14
2019-02-11 09:18:01;5;9
2019-02-11 10:15:23;1;12
2019-02-11 09:28:01;5;9"),
                 sep = ";",
                 header = TRUE)
df1 <- df %>%
  mutate(
    time = lubridate::ymd_hms(time),
    time_10min = floor_date(time, "hour") + minutes(minute(time) %/% 10 * 10)
  )
df1
#>                  time quantity unit_price          time_10min
#> 1 2019-02-11 09:15:23        2         14 2019-02-11 09:10:00
#> 2 2019-02-11 09:18:01        5          9 2019-02-11 09:10:00
#> 3 2019-02-11 10:15:23        1         12 2019-02-11 10:10:00
#> 4 2019-02-11 09:28:01        5          9 2019-02-11 09:20:00

df1 %>%
  group_by(time_10min) %>%
  summarise(avg_price = mean(unit_price),
            revenue = sum(quantity * unit_price))
#> # A tibble: 3 x 3
#>   time_10min          avg_price revenue
#>   <dttm>                  <dbl>   <int>
#> 1 2019-02-11 09:10:00      11.5      73
#> 2 2019-02-11 09:20:00       9        45
#> 3 2019-02-11 10:10:00      12        12

Upvotes: 0

Related Questions