Creating a best fit probability distribution from pdf sample coordinates with scipy

Question

Problem: I have data points indicating coordinates sampled from a probability distriabution (in this case we will assume a discrete probability distribution function) We are essentially forming a 'best fit of a pdf' from pdf data here.

Given: sample coordinates of pdf and the type of pdf type to fit to it (e.g. lognorm)

Return: Ideally the pdf parameters, or, alternatively the coordinates of the best fit distribution.

I have not found a question on stackoverflow with this question/answer and I understand it may be poor practice. It seems that scipy explicitly likes the original data to build the pdf parameters from, not sample coordinates from a pdf.

I have vectors whereby:

x = list(range(40))

y = 
[0.032935611986072325,
 0.15399668951796566,
 0.19217568076280733,
 0.16189644686218774,
 0.11504756998080325,
 0.09474568682103104,
 0.08971162676825704,
 0.06198299715985481,
 0.04408241680044377,
 0.026817519111333753,
 0.013562814925870696,
 0.007007365243147507,
 0.003909173588759217,
 0.0015053452905258473,
 0.00037481359597322736,
 0.0001378624720821066,
 5.734365756863486e-05,
 2.9711739672867803e-05,
 8.022169711674307e-06,
 5.942347934573561e-06,
 2.228380475465085e-06,
 3.7139674591084754e-06,
 8.913521901860341e-07,
 8.913521901860341e-07,
 5.94234793457356e-07,
 2.97117396728678e-07,
 2.97117396728678e-07,
 2.97117396728678e-07,
 1.48558698364339e-07,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0]

Julius · Accepted Answer

Calling your PDF f(x):

If your data really represents {x, f(x)} then you could try simply optimizing for the parameters of f using e.g. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq

If your data on the other hand are samples from the probability distribution, i.e. your data looks like {x} but each x is chosen with probability f(x), then you should try Markov Chain Monte Carlo to estimate f. There are several choices for Python:

https://pystan.readthedocs.io/en/latest/

http://docs.pymc.io/notebooks/getting_started.html#Model-fitting

Creating a best fit probability distribution from pdf sample coordinates with scipy

Answers (2)

Related Questions