use scipy.stats to automatically fit and use the parameter in pdf calculation

Question

I would like my program to automatically choose the distribution that has the best fitness and use this distribution's probability density function to calculate the probability

Use scipy.stats.rv_continuous.fit to get the parameter of fitting, e.g.

paras = scipy.stats.norm.fit(data_array)
Use scipy.stats.kstest to test the fitness

fitness = scipy.stats.kstest(data_array, paras)
Choose the distribution that gives the lowest kstest score
Calculate the probability, e.g.

scipy.stats.norm.pdf(my_values, paras)

I am not sure whether this is a rigorously correct way to choose the best-fit distribution. Currently it works well for normal distribution.

My problem is how to parse the argument to scipy.stats.rv_continuous.pdf(). For some distributions there are three parameters calculated from scipy.stats.rv_continuous.fit(), including the shape, loc and scale. I tried to parse directly like

scipy.stats.rv_continuous.pdf(my_values, paras[0], paras[1], paras[2])

this will give me two values for pdf for one point.

I also tried to parse in this way

scipy.stats.rv_continuous.pdf(my_values, paras[0], paras[1], paras[2])

But the outcome is wierd. Does anybody ever want to do something like this and meet some problem of the same kind?

My goal is to replace the gaussian with any better distributions in the Naive Bayesian classification, in hope to improve the prediction accuracy.

use scipy.stats to automatically fit and use the parameter in pdf calculation

Answers (1)

Related Questions