badavadapav
badavadapav

Reputation: 91

How to generate predictions based on the distribution of the data using Python

My dataframe currently looks like this (lets call this df_1).

date   var1 
1-1-01 0.1   
2-1-01 0.02 
3-1-01 3.00   
4-1-01 4.5   
5-1-01 0.9   
6-1-01 0.22   

The var_1 is normally distributed. (see photo below) enter image description here

I have another data frame that simply consists of dates with no value of var1 (lets call this df_2):

date   var1 
1-2-01    
2-2-01 
3-2-01 
4-2-01 
5-2-01 
6-2-01 

I simply want predictions based on random draw from the normal distribution of var1 in df_1. How can I do this in python?

PS: Do not worry about the kurtosis (height) of the distribution at 0. I know it is the highest. Think of it like the mean of the distribution (as well as the median and mode) is 0. I want to make sure that this fact is taken into account when making predictions.

Upvotes: 0

Views: 1216

Answers (1)

stevemo
stevemo

Reputation: 1097

You can fit a normal distribution to var_1, and then draw samples from it,

import scipy
import numpy as np

# fit to var_1
mu, std = scipy.stats.norm.fit(df['var_1'])

# generate data for var_2
var_2 = np.random.normal(mu, std, size=len(df['var_1']))

But please note what you're asking ignores the dates, which means you're ignoring any time series structure.

Upvotes: 1

Related Questions