Reputation: 11
I have daily sales data for multiple products in three stores. It looks something like this:
item_id | store_id | category_id | dept_id | date | event_name | daiy_price |
---|---|---|---|---|---|---|
a | tx_1 | food | 1 | 2012/12/24 | 6 | |
a | tx_1 | food | 1 | 2012/12/25 | christmas | 6 |
a | tx_1 | food | 1 | 2012/12/26 | 6 | |
a | tx_1 | food | 1 | 2012/12/27 | 7 | |
a | tx_1 | food | 1 | 2012/12/28 | 7 | |
b | tx_1 | food | 1 | 2012/12/24 | 6 | |
b | tx_1 | food | 1 | 2012/12/25 | christmas | 6 |
b | tx_1 | food | 1 | 2012/12/26 | 6 | |
b | tx_1 | food | 1 | 2012/12/27 | 7 | |
b | tx_1 | food | 1 | 2012/12/28 | 7 |
I want to round each week to the monday date using floor_date from lubridate package in R, then use group_by and summarise to compute weekly price average. However, i want each week to retain event_name if there is any during a particular week. Including event_name in group_by is not helpful since a day in a week may have an event but other days may not and they get grouped separately. Say each week starts on a monday and 2012/12/24 is a monday, how do i achieve this
item_id | store_id | category_id | dept_id | date | event_name | weekly_avg_price |
---|---|---|---|---|---|---|
a | tx_1 | food | 1 | 2012/12/24 | christmas | 6.4 |
b | tx_1 | food | 1 | 2012/12/24 | christmas | 6.4 |
Upvotes: 1
Views: 29
Reputation: 19304
You can achieve your desired output for this toy example by filling in the empty event names and then summarising the weekly prices by event_name and taking the first of the other columns.
library(tidyverse)
library(lubridate)
mutate(df, week_start=floor_date(date, unit = "week", week_start = 1)) |>
group_by(item_id, store_id, category_id, dept_id) |>
fill(event_name, .direction="downup") |>
ungroup() |>
summarise(store_id=first(store_id),
category_id=first(category_id),
event_name=first(event_name),
dept_id=first(dept_id),
weekly_avg_price=mean(daiy_price, na.rm=TRUE), .by=c(item_id, week_start))
Gives:
# A tibble: 2 × 7
item_id week_start store_id category_id event_name dept_id weekly_avg_price
<chr> <date> <chr> <chr> <chr> <dbl> <dbl>
1 a 2012-12-24 tx_1 food christmas 1 6.4
2 b 2012-12-24 tx_1 food christmas 1 6.4
Data:
df <- structure(list(item_id = c("a", "a", "a", "a", "a", "b", "b",
"b", "b", "b"), store_id = c("tx_1", "tx_1", "tx_1", "tx_1",
"tx_1", "tx_1", "tx_1", "tx_1", "tx_1", "tx_1"), category_id = c("food",
"food", "food", "food", "food", "food", "food", "food", "food",
"food"), dept_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), date = structure(c(15698,
15699, 15700, 15701, 15702, 15698, 15699, 15700, 15701, 15702
), class = "Date"), event_name = c(NA, "christmas", NA, NA, NA,
NA, "christmas", NA, NA, NA), daiy_price = c(6, 6, 6, 7, 7, 6,
6, 6, 7, 7)), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
Upvotes: 1