Reputation: 127
I have data set with all dates in 2021 and I would like to create random samples of repeating dates in each month. The distribution of dates within each month should follow a certain pattern that mirrors a specific percentage of day of the week. For example, I would like to generate 1000 dates from January 2021 and approximately 8% or 80 of these days should be Mondays. Please consider the following working example:
dt2021 <-
tibble(SalesDate = seq.Date(
ymd("2021-01-01"),
ymd("2021-12-31"), 1)) %>%
mutate(
wkDay=weekdays(SalesDate),
year=year(SalesDate),
month=month(SalesDate))
dt2021 %>% glimpse()
dtWkDays <- tibble(
wkDay=c("Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday",
"Sunday"),
Freq=c(0.08, 0.07, 0.09, 0.12, 0.31, 0.32,
0.01))
dtWkDays
My pseudo script for what I am trying to do would look something like the following:
set.seed(123)
dt2021_01 <- dt2021 %>% filter(month==1) %>%
# generate a random sample of 1000 dates
# use the wkday in dtWkDays for the grouping (stratification)
# use the Freq in dtWkDays for the weights
# resample = T
If the solution is correct, the following R script should produce around 80 Mondays, 70 Tuesdays, 90 Wednesdays, 120 Thursdays, etc.
dt2021_01 %>% count(wkDay)
I have tried several combinations using slice_sample
, sample_frac
, and group_by
, weight_by
, etc., and nothing has generated the correct results for me.
Upvotes: 1
Views: 739
Reputation: 30494
I believe this might work. Join the frequency tibble with your date tibble. After filtering for the month of interest, a revised frequency can be calculated based on frequency for day of the week, adjusting for number of times that day of the week appears in that month. Finally, use slice_sample
with this new frequency included as weight_by
(weights add up to 1, though they otherwise would be standardized to add up to 1 anyways).
library(tidyverse)
set.seed(123)
dt2021 %>%
filter(month == 1) %>%
left_join(dtWkDays) %>%
group_by(wkDay) %>%
mutate(newFreq = Freq / n()) %>%
ungroup() %>%
slice_sample(n = 1000, weight_by = newFreq, replace = TRUE) %>%
count(wkDay)
Output
wkDay n
<chr> <int>
1 Friday 312
2 Monday 81
3 Saturday 320
4 Sunday 10
5 Thursday 120
6 Tuesday 62
7 Wednesday 95
Upvotes: 1