Reputation: 127
I want to generate data in Python that behaves like real stock market data, which means I need to be able to specify and play around with all of the first four moments. Only being able to control skewness or only kurtosis is unfortunately not enough.
I found some answers here: How to generate a distribution with a given mean, variance, skew and kurtosis in Python?, however I seem unable to gain control of the properties with the gengamma distribution.
I know there are tons of distributions here: https://docs.scipy.org/doc/scipy/reference/stats.html#continuous-distributions, maybe I can use one of them in some clever way? Or is there another way?
Upvotes: 6
Views: 1743
Reputation: 1
One way of generating such data is by repeatedly sampling random numbers within specific minimum and maximum until the desired statistic of the data is within a given tolerance. See the following Python code for example:
from scipy.stats import skew, kurtosis
import numpy as np
def Generator(lower, upper, m, v, kur, sk, n, tol=0.01):
"""
This function generates a list of random numbers within a given range that meet specific statistical criteria.
Parameters:
lower (int): The lower limit of the range from which to generate random numbers.
upper (int): The upper limit of the range from which to generate random numbers.
m (float): The desired mean value for the generated data.
v (float): The desired variance for the generated data.
kur (float): The desired kurtosis for the generated data.
sk (float): The desired skewness for the generated data.
n (int): The number of random numbers to generate.
tol (float, optional): The tolerance for the mean, variance, kurtosis, and skewness. Defaults to 0.01.
Returns:
list: A list of n random numbers that meet the specified statistical criteria.
"""
while True:
data=list(np.random.choice(np.arange(lower, upper), n, replace=True))
if (abs(np.mean(data)-m)< tol and abs(np.var(data)-v)< tol
and abs(kurtosis(data)-kur)< tol and abs(skew(data)-sk)< tol):
return data
If you set the tolerance minimal, it will take time to generate data. Now to generate 100 data points with a minimum of 0 and a maximum of 10, with mean=5, variance=9.5, etc... you have:
g=Generator(lower=0, upper=10, m=5, v=9.5, kur=-1.5, sk=-0.3, n=100, tol=0.1)
and
np.mean(g), np.var(g), kurtosis(g), skew(g), len(g), min(g), max(g)
(4.98, 9.5996, -1.428594751943574, -0.25566031308484666, 100, 0, 9)
Upvotes: 0
Reputation: 8262
There are a number of potential choices of distribution family.
The classic example would be the Pearson family of distributions.
https://en.wikipedia.org/wiki/Pearson_distribution
These encompass scaled (including multiplication by negative values to get left-skewed distributions) and shifted versions of the beta, gamma, inverse gamma, t and F distributions, among others.
Upvotes: 1
Reputation: 314
I think you are better using the gengamma function in scipy since you have all the parameters to control the shape of the distribution.
from scipy.stats import gengamma
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gengamma.html
Hopes this helps.
Upvotes: 0