Juse Pallati
Juse Pallati

Reputation: 41

Fitting a binomial distribution to a curve with python

I am trying to fit this list to binomial distribution: [0, 1, 1, 1, 3, 5 , 5, 9, 14, 20, 12, 8, 5, 3, 6, 9, 13, 15, 18, 23, 27, 35, 25, 18, 12, 10, 9, 5 , 0]

I need to retrieve the parameters of the distrbuition so I can apply it to some simulations I need to do. I am using scipy:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.stats import binom

data = [0, 1, 1, 1, 3, 5 , 5, 9, 14, 20, 12, 8, 5, 3, 6, 9, 13, 15, 18, 23, 27, 35, 25, 18, 12, 10, 9, 5 , 0]

def fit_function(x, n, p):
    return binom.pmf(x, n, p)

num_bins = 10

params, covmat = curve_fit(fit_function, 10,  data)

But I get the following error:


RuntimeError Traceback (most recent call last) in 4 5 # fit with curve_fit ----> 6 parameters, cov_matrix = curve_fit(fit_function, 10, data)

~\AppData\Local\Continuum\anaconda3\envs\py37\lib\site-packages\scipy\optimize\minpack.py in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, **kwargs) 746 cost = np.sum(infodict['fvec'] ** 2) 747 if ier not in [1, 2, 3, 4]: --> 748 raise RuntimeError("Optimal parameters not found: " + errmsg) 749 else: 750 # Rename maxfev (leastsq) to max_nfev (least_squares), if specified.

RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 600.


Regardless of the error how can I fit this data to a binomial curve with python?

Upvotes: 0

Views: 2537

Answers (3)

Johnny Cheesecutter
Johnny Cheesecutter

Reputation: 2853

You can use scipy.stats.fit method

DATA = [22, 23, 24, 25, 26, 27]

from scipy.stats import binom
import scipy


scipy.stats.fit(binom,DATA, bounds={"n":[25,30]})

Upvotes: 0

erdogant
erdogant

Reputation: 1694

A manner to retrieve the parameters of a discrete distribution can be done with the distfit library. A small example is as follow:

pip install distfit

# Generate random numbers
from scipy.stats import binom
# Set parameters for the test-case
n = 8
p = 0.5

# Generate 10000 samples of the distribution of (n, p)
X = binom(n, p).rvs(10000)
print(X)
[4 7 4 ... 2 2 6]

dfit = distfit(method='discrete')
# Search for best theoretical fit on your empirical data
dfit.fit_transform(X)

# Get the model and best fitted parameters.
print(dfit.model)

# {'distr': <scipy.stats._distn_infrastructure.rv_frozen at 0x1ff23e3beb0>,
#  'params': (8, 0.4999585504197037),
#  'name': 'binom',
#  'SSE': 7.786589839641551,
#  'chi2r': 1.1123699770916502,
#  'n': 8,
#  'p': 0.4999585504197037,
#  'CII_min_alpha': 2.0,
#  'CII_max_alpha': 6.0}

# Best fitted n=8 and p=0.4999 which is great because the input was n=8 and p=0.5
dfit.model['n']
dfit.model['p']


# The plot function
dfit.plot(chart='PDF',
      emp_properties={'linewidth': 4, 'color': 'k'},
      bar_properties={'edgecolor':'k', 'color':None},
      pdf_properties={'linewidth': 4, 'color': 'r'})

Pareto plot

Disclaimer: I am also the author of this repo.

Upvotes: 1

Diego
Diego

Reputation: 11

It seems you need to increase the number of iterations maxfev, try

params, covmat = curve_fit(fit_function, 10,  data, maxfev=2000)

Upvotes: 0

Related Questions