how to get numpy random generated numbers in a certain pattern to simulate for example weather over days of the year

Question

Ok so I need to create some random data for simulation purposes. I know the mean values and standard deviations of some real life scenarios that I am trying to generate. The issue that I am having is that the random number generated that correspond with the dates are not realistic. For example the weather (MinTP), it fluctuations largely which is not realistic. I want the numbers to be generated in a certain pattern so that the mean would appear at the middle of the data set. Please see example of my code below and output of table and weather scatterplot over the year. I have been using np.random.normal() to generate the data maybe I need to use a different function?

import numpy as np
import pandas as pd
import datetime

np.random.seed(2)

start2018 = datetime.datetime(2018, 1, 1)
end2018 = datetime.datetime(2018, 12, 31)
dates2018 = pd.date_range(start2018, end2018, freq='d')
synEne2018 = np.random.normal(loc=66.883795, scale=5.448145, size=365)
synMintp2018 = np.random.normal(loc=7.203288, scale=4.690315, size=365)
synCovidDailyCases2018 = np.random.normal(loc=0.0, scale=0.0, size=365)
synCovidDailyDeaths2018 = np.random.normal(loc=0.0, scale=0.0, size=365)
syn2018data = pd.DataFrame({'Date': dates2018, 'Total Daily Energy': synEne2018, 'MinTp': synMintp2018, 'DailyCovidCases': synCovidDailyCases2018, 'DailyCovidDeaths': synCovidDailyDeaths2018})
print(syn2018data)

fig, ax =plt.subplots()
sns.scatterplot(x="Date", y='MinTp', data=syn2018data[0:], color='r')

Whole Brain · Accepted Answer

Normal distribution

A normal distribution has two parameters :

The mean. In numpy.random.normal, it's the parameter "loc".
The standard deviation std, which is the parameter "scale". It's actually not the whole scale of your resulting values, but it's a distribution that follows the normal law. Basically, it means that 68% of your data will be within one std from your mean, 95% of your data will be within two std from your mean, and 99.7% of your values will be within three std from your mean.

You can use a normal distribution table to have a grasp of the values you can expect to get.

Keep in mind that this table represents the probability to get a value between mean and mean + z * std, not the probability to get a value between mean - z * stdand mean + z * std. You have to do 2 * p - 1 to get the latter.

Simulating yearly temperatures

If you lower the scale, you will get values closer to your mean.

To be more realistic, I would advice to get a curve for base minTp (with the minimum value in winter, and maximum value in august), then add randomness using the normal distribution with loc=0 and scale=0.2 or so.

Using a sinus from zero to pi as a base can do the trick if you specify a mean and your range in your sin function :

import math
start2018 = datetime.datetime(2018, 1, 1)
end2018 = datetime.datetime(2018, 12, 31)
dates2018 = pd.date_range(start2018, end2018, freq='d')
t_range = 4.690315 # our range
t_mean = 7.203288  # our mean
synMintp2018 = np.sin(np.arange(365)/365 * math.pi)*t_range + t_mean
synMintp2018 += np.random.normal(loc=0, scale=0.2, size=365)
...
syn2018data = pd.DataFrame({'Date': dates2018, 'Total Daily Energy': synEne2018, 'MinTp': synMintp2018, 'DailyCovidCases': synCovidDailyCases2018, 'DailyCovidDeaths': synCovidDailyDeaths2018})

fig, ax =plt.subplots()
sns.scatterplot(x="Date", y='MinTp', data=syn2018data[0:], color='r')

Adding offset

Since the minimum is more likely to be in january, we can add offset to translate the base curve.

import math
synMintp2018 = np.sin(np.arange(365)/365 * math.pi) * 4.690315 + 7.203288
synMintp2018 = np.roll(synMintp2018, 15) # offset in days
synMintp2018 += np.random.normal(loc=0, scale=0.2, size=365)
...
syn2018data = pd.DataFrame({'Date': dates2018, 'Total Daily Energy': synEne2018, 'MinTp': synMintp2018, 'DailyCovidCases': synCovidDailyCases2018, 'DailyCovidDeaths': synCovidDailyDeaths2018})


fig, ax =plt.subplots()
sns.scatterplot(x=syn2018data.index, y=syn2018data['MinTp'], color='r')

how to get numpy random generated numbers in a certain pattern to simulate for example weather over days of the year

Answers (1)

Normal distribution

Simulating yearly temperatures

Adding offset

Related Questions