Reputation: 3
I have a large dataset of sales information from multiple stores over a period of a few weeks. I need to calculate revenues and average prices over minute intervals, and I can't figure out a smart way to do this. So for example for the data below, I'd want to calculate the revenues and average prices over 10-minute periods, for example the 10-minute period on 2019-02-11 from 09:10:00 to 09:20:00 would result in 2 * 14 + 5 * 9. I've considered labeling each interval with a number and adding a column with the labels, but I don't really know how to implement this. Another option I thought of was to create a separate dataframe with the intervals, and then somehow mapping information from the original data to the interval dataframe, but I didn't get far with this either. Any help on this would be much appreciated!
Example data:
Time | Quantity | Price |
---|---|---|
2019-02-11 09:15:23 | 2 | 14 |
2019-02-11 09:18:01 | 5 | 9 |
2019-02-11 10:15:23 | 1 | 12 |
2019-02-11 09:28:01 | 5 | 9 |
Upvotes: 0
Views: 36
Reputation: 1466
library(tidyverse)
library(lubridate)
df <- read.table(textConnection("time;quantity;unit_price
2019-02-11 09:15:23;2;14
2019-02-11 09:18:01;5;9
2019-02-11 10:15:23;1;12
2019-02-11 09:28:01;5;9"),
sep = ";",
header = TRUE)
df1 <- df %>%
mutate(
time = lubridate::ymd_hms(time),
time_10min = floor_date(time, "hour") + minutes(minute(time) %/% 10 * 10)
)
df1
#> time quantity unit_price time_10min
#> 1 2019-02-11 09:15:23 2 14 2019-02-11 09:10:00
#> 2 2019-02-11 09:18:01 5 9 2019-02-11 09:10:00
#> 3 2019-02-11 10:15:23 1 12 2019-02-11 10:10:00
#> 4 2019-02-11 09:28:01 5 9 2019-02-11 09:20:00
df1 %>%
group_by(time_10min) %>%
summarise(avg_price = mean(unit_price),
revenue = sum(quantity * unit_price))
#> # A tibble: 3 x 3
#> time_10min avg_price revenue
#> <dttm> <dbl> <int>
#> 1 2019-02-11 09:10:00 11.5 73
#> 2 2019-02-11 09:20:00 9 45
#> 3 2019-02-11 10:10:00 12 12
Upvotes: 0