Reputation: 1759
I would like to generate a random list of length n
based on the dates of, say, September.
So, you have your list like this:
september = ["01/09/2019","02/09/2019",...,"30/09/2019"]
And I would like to generate a list that contains, say, 1000 elements taken randomly from september
like this:
dates = ["02/09/2019","02/09/2019","07/09/2019",...,"23/09/2019"]
I could use something like:
dates = np.random.choice(september,1000)
But the catch is that I want dates to be selected based on the probabilities of the days of the week. So for example, I have a dictionary like this:
days = {"Monday":0.1,"Tuesday":0.4,"Wednesday":0.1,"Thursday":0.05,"Friday":0.05,"Saturday":0.2,"Sunday":0.1}
So as "01/01/2019"
was Sunday, I would like to choose this date from september
with probability 0.1.
My attempt was to create a list whose first element is the probability of the first date in september
and after 7 days this probability repeats and so on, like this:
p1 = [0.1,0.1,0.4,0.1,0.05,0.05,0.2,0.1,0.1,0.4,0.1,0.05,0.05,...]
Obviously this doesn't add to 1, so I would do the following:
p2 = [x/sum(p1) for x in p1]
And then:
dates = np.random.choice(september,1000,p=p2)
However, I am not sure this really works... Can you help me?
Upvotes: 2
Views: 1365
Reputation: 22493
Actually I think your approach is fine. But instead of using the dates, first get a list of dates grouped by weekdays:
import numpy as np
import datetime
from collections import defaultdict
days = {"Monday":0.1,"Tuesday":0.4,"Wednesday":0.1,"Thursday":0.05,"Friday":0.05,"Saturday":0.2,"Sunday":0.1}
date_list = [(datetime.datetime(2019, 9, 1) + datetime.timedelta(days=x)) for x in range(30)]
d = defaultdict(list)
for i in date_list:
d[i.strftime("%A")].append(i)
Now pass this to np.random.choice
:
np.random.seed(500)
result = np.random.choice(list(d.values()),
p=[days.get(i) for i in list(d.keys())],
size=1000)
You now have a list of lists of weighted datetime
objects. Just do another random.choice
for the items inside:
final = [np.random.choice(i) for i in result]
Upvotes: 1
Reputation: 2905
If I understand correctly, you want to select dates from the days of September, where the probability to select each date is proportional to the number of times that the weekday of that date appears in September - and what you need is how to assign the proper probabilities.
I'll show how to assign the probabilities using pandas
(just because it's convenient to me).
First, create the array of relevant dates using a pd.DatetimeIndex
, so the elements of the array (Index in this case) are pd.Timestamp
objects:
import pandas as pd
days_of_september = pd.DatetimeIndex(freq='1D', start='2019/09/01', end='2019/09/30')
to each date, we assign its weekday (from 0 to 6), using the .weekday
method (this is why a Timestamp or a datetime are convenient here):
days_and_weekdays_of_september = pd.DataFrame(
[(day, day.weekday()) for day in days_of_september], columns=('date', 'weekday'))
Count how many times each weekday appears in the month:
weekday_counts = days_and_weekdays_of_september['weekday'].value_counts()
(No big suprise here - all the values are either 4 or 5).
Assign a probability relative to that count:
probability = days_and_weekdays_of_september.apply(lambda date: weekday_counts[date['weekday']], axis=1)
probability = probability/probability.sum()
And then, with pandas
, you can select based on those probabilities (called "weights" here):
days_and_weekdays_of_september['date'].sample(n=1000, weights=probability, replace=True)
Upvotes: 1