Reputation: 377
I have a set of sales report, containing stores that report either "per day" or "per month" sales figures. When I plot them on the same graph, "per month" figures look like spikes, making the graph hard to comprehend.
I wish to convert those "once per month" figures, to spread out evenly over the days in a month, so that I can plot a daily sales graph.
I managed to use tidyverse, lubridate to calculate "sales_per_day" column in the data sets. How can I create rows that are "1 row per day", ie. for 2019-01, create 30 daily rows from each 1 row of monthly data?
sales <- tibble(
distributor = c("StoreA", "StoreA", "StoreA", "StoreA", "StoreB"),
sales = c(100,200,300,400,5000),
date = c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-30"),
freq = c("daily", "daily", "daily", "daily", "monthly"))
> sales
# A tibble: 5 x 4
distributor sales date freq
<chr> <dbl> <chr> <chr>
1 StoreA 100 2019-01-01 daily
2 StoreA 200 2019-01-02 daily
3 StoreA 300 2019-01-03 daily
4 StoreA 400 2019-01-04 daily
5 StoreB 5000 2019-01-30 monthly
wanted_sales <- tibble(
distributor = c("StoreA", "StoreA", "StoreA", "StoreA", "StoreB", "StoreB", "StoreB", "StoreB"),
sales = c(100, 200, 300, 400, 5000 / 30, 5000 / 30, 5000 / 30, 5000 / 30),
date = c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"),
freq = c("daily", "daily", "daily", "daily", "daily", "daily", "daily", "daily" ))
> wanted_sales
# A tibble: 8 x 4
distributor sales date freq
<chr> <dbl> <chr> <chr>
1 StoreA 100 2019-01-01 daily
2 StoreA 200 2019-01-02 daily
3 StoreA 300 2019-01-03 daily
4 StoreA 400 2019-01-04 daily
5 StoreB 167. 2019-01-01 daily
6 StoreB 167. 2019-01-02 daily
7 StoreB 167. 2019-01-03 daily
8 StoreB 167. 2019-01-04 daily
per_day <- sales %>% filter(freq == "monthly") %>%
group_by(date) %>%
mutate(mdays = as.integer(days_in_month(as_date(date)))) %>%
mutate(sales_per_day = sales / mdays)
> per_day
# A tibble: 1 x 6
# Groups: date [1]
distributor sales date freq mdays sales_per_day
<chr> <dbl> <chr> <chr> <int> <dbl>
1 StoreB 5000 2019-01-30 monthly 31 161.
I wish to make the resulting per_day tibble, with 30 rows, with $date column being a sequence of "2019-01-01", "2019-01-02"... "2019-01-30".
Upvotes: 3
Views: 199
Reputation: 388982
We can change the date
to actual Date class and create a new column startdate
which would have first day of that particular month if freq
is not "daily"
and sales
is divided by 30. For each date
we use complete
to create sequence of dates and change freq
to "daily"
for all.
library(dplyr)
library(tidyr)
library(lubridate)
sales %>%
mutate(date = as.Date(date),
startdate = if_else(freq == "daily", date, floor_date(date, "month")),
sales = if_else(freq == "daily", sales, sales/30)) %>%
group_by(date) %>%
complete(date = seq(startdate, date, "1 day"), sales = sales,
freq = "daily", distributor = distributor) %>%
select(-startdate)
# Groups: date [30]
# date sales freq distributor
# <date> <dbl> <chr> <chr>
# 1 2019-01-01 100 daily StoreA
# 2 2019-01-02 200 daily StoreA
# 3 2019-01-03 300 daily StoreA
# 4 2019-01-04 400 daily StoreA
# 5 2019-01-01 167. daily StoreB
# 6 2019-01-02 167. daily StoreB
# 7 2019-01-03 167. daily StoreB
# 8 2019-01-04 167. daily StoreB
# 9 2019-01-05 167. daily StoreB
#10 2019-01-06 167. daily StoreB
# … with 25 more rows
Upvotes: 4