Reputation: 91
My dataframe currently looks like this (lets call this df_1).
date var1
1-1-01 0.1
2-1-01 0.02
3-1-01 3.00
4-1-01 4.5
5-1-01 0.9
6-1-01 0.22
The var_1
is normally distributed. (see photo below)
I have another data frame that simply consists of dates with no value of var1 (lets call this df_2):
date var1
1-2-01
2-2-01
3-2-01
4-2-01
5-2-01
6-2-01
I simply want predictions based on random draw from the normal distribution of var1 in df_1. How can I do this in python?
PS: Do not worry about the kurtosis (height) of the distribution at 0. I know it is the highest. Think of it like the mean of the distribution (as well as the median and mode) is 0. I want to make sure that this fact is taken into account when making predictions.
Upvotes: 0
Views: 1216
Reputation: 1097
You can fit a normal distribution to var_1
, and then draw samples from it,
import scipy
import numpy as np
# fit to var_1
mu, std = scipy.stats.norm.fit(df['var_1'])
# generate data for var_2
var_2 = np.random.normal(mu, std, size=len(df['var_1']))
But please note what you're asking ignores the dates, which means you're ignoring any time series structure.
Upvotes: 1