taiyodayo
taiyodayo

Reputation: 377

creating "Per Day" rows, from selective "Per Month" figures using tidyverse

I have a set of sales report, containing stores that report either "per day" or "per month" sales figures. When I plot them on the same graph, "per month" figures look like spikes, making the graph hard to comprehend.

I wish to convert those "once per month" figures, to spread out evenly over the days in a month, so that I can plot a daily sales graph.

I managed to use tidyverse, lubridate to calculate "sales_per_day" column in the data sets. How can I create rows that are "1 row per day", ie. for 2019-01, create 30 daily rows from each 1 row of monthly data?

sales <- tibble(
  distributor = c("StoreA", "StoreA", "StoreA", "StoreA", "StoreB"), 
  sales = c(100,200,300,400,5000), 
  date = c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-30"),
  freq = c("daily", "daily", "daily", "daily", "monthly"))

> sales
# A tibble: 5 x 4
  distributor sales date       freq   
  <chr>       <dbl> <chr>      <chr>  
1 StoreA        100 2019-01-01 daily  
2 StoreA        200 2019-01-02 daily  
3 StoreA        300 2019-01-03 daily  
4 StoreA        400 2019-01-04 daily  
5 StoreB       5000 2019-01-30 monthly


wanted_sales <- tibble(
  distributor = c("StoreA", "StoreA", "StoreA", "StoreA", "StoreB", "StoreB", "StoreB", "StoreB"), 
  sales = c(100, 200, 300, 400, 5000 / 30, 5000 / 30, 5000 / 30, 5000 / 30), 
  date = c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"),
  freq = c("daily", "daily", "daily", "daily", "daily", "daily", "daily", "daily" ))

> wanted_sales
# A tibble: 8 x 4
  distributor sales date       freq 
  <chr>       <dbl> <chr>      <chr>
1 StoreA       100  2019-01-01 daily
2 StoreA       200  2019-01-02 daily
3 StoreA       300  2019-01-03 daily
4 StoreA       400  2019-01-04 daily
5 StoreB       167. 2019-01-01 daily
6 StoreB       167. 2019-01-02 daily
7 StoreB       167. 2019-01-03 daily
8 StoreB       167. 2019-01-04 daily

per_day <- sales %>% filter(freq == "monthly") %>%
  group_by(date) %>%
  mutate(mdays = as.integer(days_in_month(as_date(date)))) %>%
  mutate(sales_per_day = sales / mdays)

> per_day
# A tibble: 1 x 6
# Groups:   date [1]
  distributor sales date       freq    mdays sales_per_day
  <chr>       <dbl> <chr>      <chr>   <int>         <dbl>
1 StoreB       5000 2019-01-30 monthly    31          161.

I wish to make the resulting per_day tibble, with 30 rows, with $date column being a sequence of "2019-01-01", "2019-01-02"... "2019-01-30".

Upvotes: 3

Views: 199

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

We can change the date to actual Date class and create a new column startdate which would have first day of that particular month if freq is not "daily" and sales is divided by 30. For each date we use complete to create sequence of dates and change freq to "daily" for all.

library(dplyr)
library(tidyr)
library(lubridate)

sales %>%
  mutate(date = as.Date(date), 
         startdate = if_else(freq == "daily", date, floor_date(date, "month")), 
         sales = if_else(freq == "daily", sales, sales/30)) %>%
   group_by(date) %>%
   complete(date = seq(startdate, date, "1 day"), sales = sales, 
            freq = "daily", distributor = distributor) %>%
   select(-startdate)

# Groups:   date [30]
#   date       sales freq  distributor
#   <date>     <dbl> <chr> <chr>      
# 1 2019-01-01  100  daily StoreA     
# 2 2019-01-02  200  daily StoreA     
# 3 2019-01-03  300  daily StoreA     
# 4 2019-01-04  400  daily StoreA     
# 5 2019-01-01  167. daily StoreB     
# 6 2019-01-02  167. daily StoreB     
# 7 2019-01-03  167. daily StoreB     
# 8 2019-01-04  167. daily StoreB     
# 9 2019-01-05  167. daily StoreB     
#10 2019-01-06  167. daily StoreB     
# … with 25 more rows

Upvotes: 4

Related Questions