How to generate predictions based on the distribution of the data using Python

Question

My dataframe currently looks like this (lets call this df_1).

date   var1 
1-1-01 0.1   
2-1-01 0.02 
3-1-01 3.00   
4-1-01 4.5   
5-1-01 0.9   
6-1-01 0.22

The var_1 is normally distributed. (see photo below)

I have another data frame that simply consists of dates with no value of var1 (lets call this df_2):

date   var1 
1-2-01    
2-2-01 
3-2-01 
4-2-01 
5-2-01 
6-2-01

I simply want predictions based on random draw from the normal distribution of var1 in df_1. How can I do this in python?

PS: Do not worry about the kurtosis (height) of the distribution at 0. I know it is the highest. Think of it like the mean of the distribution (as well as the median and mode) is 0. I want to make sure that this fact is taken into account when making predictions.

stevemo · Accepted Answer

You can fit a normal distribution to var_1, and then draw samples from it,

import scipy
import numpy as np

# fit to var_1
mu, std = scipy.stats.norm.fit(df['var_1'])

# generate data for var_2
var_2 = np.random.normal(mu, std, size=len(df['var_1']))

But please note what you're asking ignores the dates, which means you're ignoring any time series structure.

How to generate predictions based on the distribution of the data using Python

Answers (1)

Related Questions