Reputation: 41
I am trying to fit this list to binomial distribution: [0, 1, 1, 1, 3, 5 , 5, 9, 14, 20, 12, 8, 5, 3, 6, 9, 13, 15, 18, 23, 27, 35, 25, 18, 12, 10, 9, 5 , 0]
I need to retrieve the parameters of the distrbuition so I can apply it to some simulations I need to do. I am using scipy:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.stats import binom
data = [0, 1, 1, 1, 3, 5 , 5, 9, 14, 20, 12, 8, 5, 3, 6, 9, 13, 15, 18, 23, 27, 35, 25, 18, 12, 10, 9, 5 , 0]
def fit_function(x, n, p):
return binom.pmf(x, n, p)
num_bins = 10
params, covmat = curve_fit(fit_function, 10, data)
But I get the following error:
RuntimeError Traceback (most recent call last) in 4 5 # fit with curve_fit ----> 6 parameters, cov_matrix = curve_fit(fit_function, 10, data)
~\AppData\Local\Continuum\anaconda3\envs\py37\lib\site-packages\scipy\optimize\minpack.py in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, **kwargs) 746 cost = np.sum(infodict['fvec'] ** 2) 747 if ier not in [1, 2, 3, 4]: --> 748 raise RuntimeError("Optimal parameters not found: " + errmsg) 749 else: 750 # Rename maxfev (leastsq) to max_nfev (least_squares), if specified.
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 600.
Regardless of the error how can I fit this data to a binomial curve with python?
Upvotes: 0
Views: 2537
Reputation: 2853
You can use scipy.stats.fit
method
DATA = [22, 23, 24, 25, 26, 27]
from scipy.stats import binom
import scipy
scipy.stats.fit(binom,DATA, bounds={"n":[25,30]})
Upvotes: 0
Reputation: 1694
A manner to retrieve the parameters of a discrete distribution can be done with the distfit
library. A small example is as follow:
pip install distfit
# Generate random numbers
from scipy.stats import binom
# Set parameters for the test-case
n = 8
p = 0.5
# Generate 10000 samples of the distribution of (n, p)
X = binom(n, p).rvs(10000)
print(X)
[4 7 4 ... 2 2 6]
dfit = distfit(method='discrete')
# Search for best theoretical fit on your empirical data
dfit.fit_transform(X)
# Get the model and best fitted parameters.
print(dfit.model)
# {'distr': <scipy.stats._distn_infrastructure.rv_frozen at 0x1ff23e3beb0>,
# 'params': (8, 0.4999585504197037),
# 'name': 'binom',
# 'SSE': 7.786589839641551,
# 'chi2r': 1.1123699770916502,
# 'n': 8,
# 'p': 0.4999585504197037,
# 'CII_min_alpha': 2.0,
# 'CII_max_alpha': 6.0}
# Best fitted n=8 and p=0.4999 which is great because the input was n=8 and p=0.5
dfit.model['n']
dfit.model['p']
# The plot function
dfit.plot(chart='PDF',
emp_properties={'linewidth': 4, 'color': 'k'},
bar_properties={'edgecolor':'k', 'color':None},
pdf_properties={'linewidth': 4, 'color': 'r'})
Disclaimer: I am also the author of this repo.
Upvotes: 1
Reputation: 11
It seems you need to increase the number of iterations maxfev, try
params, covmat = curve_fit(fit_function, 10, data, maxfev=2000)
Upvotes: 0