David Armendariz
David Armendariz

Reputation: 1759

How to generate random dates based on the probability of the days in Python?

I would like to generate a random list of length n based on the dates of, say, September. So, you have your list like this:

september = ["01/09/2019","02/09/2019",...,"30/09/2019"]

And I would like to generate a list that contains, say, 1000 elements taken randomly from september like this:

dates = ["02/09/2019","02/09/2019","07/09/2019",...,"23/09/2019"]

I could use something like:

dates = np.random.choice(september,1000)

But the catch is that I want dates to be selected based on the probabilities of the days of the week. So for example, I have a dictionary like this:

days = {"Monday":0.1,"Tuesday":0.4,"Wednesday":0.1,"Thursday":0.05,"Friday":0.05,"Saturday":0.2,"Sunday":0.1}

So as "01/01/2019" was Sunday, I would like to choose this date from september with probability 0.1.

My attempt was to create a list whose first element is the probability of the first date in september and after 7 days this probability repeats and so on, like this:

p1 = [0.1,0.1,0.4,0.1,0.05,0.05,0.2,0.1,0.1,0.4,0.1,0.05,0.05,...]

Obviously this doesn't add to 1, so I would do the following:

p2 = [x/sum(p1) for x in p1]

And then:

dates = np.random.choice(september,1000,p=p2)

However, I am not sure this really works... Can you help me?

Upvotes: 2

Views: 1365

Answers (2)

Henry Yik
Henry Yik

Reputation: 22493

Actually I think your approach is fine. But instead of using the dates, first get a list of dates grouped by weekdays:

import numpy as np
import datetime
from collections import defaultdict

days = {"Monday":0.1,"Tuesday":0.4,"Wednesday":0.1,"Thursday":0.05,"Friday":0.05,"Saturday":0.2,"Sunday":0.1}

date_list = [(datetime.datetime(2019, 9, 1) + datetime.timedelta(days=x)) for x in range(30)]

d = defaultdict(list)

for i in date_list:
    d[i.strftime("%A")].append(i)

Now pass this to np.random.choice:

np.random.seed(500)

result = np.random.choice(list(d.values()),
                          p=[days.get(i) for i in list(d.keys())],
                          size=1000)

You now have a list of lists of weighted datetime objects. Just do another random.choice for the items inside:

final = [np.random.choice(i) for i in result]

Upvotes: 1

Itamar Mushkin
Itamar Mushkin

Reputation: 2905

If I understand correctly, you want to select dates from the days of September, where the probability to select each date is proportional to the number of times that the weekday of that date appears in September - and what you need is how to assign the proper probabilities.

I'll show how to assign the probabilities using pandas (just because it's convenient to me).

First, create the array of relevant dates using a pd.DatetimeIndex, so the elements of the array (Index in this case) are pd.Timestamp objects:

import pandas as pd
days_of_september = pd.DatetimeIndex(freq='1D', start='2019/09/01', end='2019/09/30')

to each date, we assign its weekday (from 0 to 6), using the .weekday method (this is why a Timestamp or a datetime are convenient here):

days_and_weekdays_of_september = pd.DataFrame(
    [(day, day.weekday()) for day in days_of_september], columns=('date', 'weekday'))

Count how many times each weekday appears in the month:

weekday_counts = days_and_weekdays_of_september['weekday'].value_counts()

(No big suprise here - all the values are either 4 or 5).

Assign a probability relative to that count:

probability = days_and_weekdays_of_september.apply(lambda date: weekday_counts[date['weekday']], axis=1)
probability = probability/probability.sum()

And then, with pandas, you can select based on those probabilities (called "weights" here):

days_and_weekdays_of_september['date'].sample(n=1000, weights=probability, replace=True)

Upvotes: 1

Related Questions