Python Applying a CDF after Fitter

Question

I would like to apply the best fit CDF found by Fitter to each value in a number of panda data-frame columns by hopefully passing the Fitter results to Scipy Stats (or another library?).

I can get the distribution function easily enough from Fitter with the following code:

import numpy as np
import pandas as pd
import seaborn as sns
from fitter import Fitter
from fitter import get_common_distributions

from fitter import get_distributions

dataset = pd.read_csv("econ.csv")
dataset.head()

sns.set_style('white')
sns.set_context("paper", font_scale = 2)

sns.displot(data = dataset, x = "Value_1",kind = "hist", bins = 100, aspect = 1.5)

spac = dataset['Value_1'].values
f = Fitter(spac, distributions=get_distributions())

f.fit()
f.summary()

f.get_best(method='sumsquare_error')

This provides me with an output for Value_1:

{'norminvgauss': {'a': 1.87, 'b': -0.65, 'loc': 0.46, 'scale': 1.24}}

Now this is where I am stuck:

Is there a way to pass this information back to Scipy Stats (or another library) so I can calculate the cumulative distribution function (CDF) of the best fit for each value in each column?

The dataset columns range from Value_1 to Value_99 with about 400 rows - Once I know how to feed the fitter results back into scipy stats I should be able to write a simple for loop to apply this over each column.

An example of the result would be like:

ID	Value1	CDF_BestFit_Value1
n	0.9	0.33
n+1	0.7	0.07

Much appreciated in advanced for anyone who is able to help with this.

Python Applying a CDF after Fitter

Answers (1)

Related Questions