user1773603
user1773603

Reputation:

Attempt of fit a Gaussian distribution: Error scipy/optimize/minpack.py", line 765, in curve_fit raise ValueError("`sigma` has incorrect shape.")

I have a pretty known issue but impossible to fix for the moment. This is about curve_fit function. I get the error:

Error scipy/optimize/minpack.py", line 765, in curve_fit raise ValueError("sigma has incorrect shape.")

Here is the code, don't make caution to the loop for, it is just I would like 5 different histograms:

for i in range(5):
  mean_o[i] = np.mean(y3[:,i])
  sigma_o[i] = np.std(y3[:,i])

## Histograms
# Number of bins
Nbins=100
binwidth = np.zeros(5)

# Fitting curves
def gaussian(x, a, mean, sigma):
  return a * np.exp(-((x - mean)**2 / (2 * sigma**2)))

for i in range(5):

  binwidth[i] = (max(y3[:,i]) - min(y3[:,i]))/Nbins
  bins_plot = np.arange(min(y3[:,i]), max(y3[:,i]) + binwidth[i], binwidth[i])
  plt.title('Distribution of O observable for redshift bin = '+str(z_ph_fid[i]))
  plt.hist(y3[:,i], bins=bins_plot, label='bin '+str(z_ph_fid[i]))
  plt.legend(loc='upper right')
  # Fitting and plot
  range_fit = np.linspace(min(y3[:,i]), max(y3[:,i]), len(y3[:,i]))
  popt, pcov = curve_fit(gaussian, range_fit, y3[:,i], mean_o[i], sigma_o[i])
  plt.plot(range_fit, gaussian(range_fit, *popt))
  # Save figure
  plt.savefig('chi2_gaussian_bin_'+str(i+1)+'.png')
  plt.close()

The first histogram i=0 look like:

Hisotgram

I would like to plot a red Gaussian fit over the histogram.

Upvotes: 0

Views: 452

Answers (1)

mikuszefski
mikuszefski

Reputation: 4043

There are two problems with the OP.

First problem is that the code tries to fit the random samples with a normal distribution. This is wrong. One can fit the output of the histogram, though. This is shown in the code below. Better would be the use of scipy.stats.norm.fit() which allows to fit the random samples. This is also shown.

Second problem is the sigma-shape. Here curve_fit is actually expecting errors on the y-data, which naturally needs the shape of the y-data. What should have been done is: providing start values for the fit. This is shown below as well.

Code looks like:

import matplotlib.pyplot as plt
import numpy as np

from scipy.stats import norm
from scipy.optimize import curve_fit

mean_o = list()
sigma_o = list()
y3 = list()

### generate some data
for i in range( 5 ):
    y3.append( norm.rvs( size=150000 ) )
y3 = np.transpose( y3 )

for i in range(5):
    mean_o.append( np.mean( y3[ :, i ] ) )
    sigma_o.append(  np.std( y3[ :, i ] ) )

## Histograms
# Number of bins
Nbins=100
binwidth = np.zeros(5)

# Fitting curves
def gaussian( x, a , mean, sigma ):
  return a * np.exp( -( ( x - mean )**2 / (2 * sigma**2 ) ) )

fig = plt.figure()
ax = { i : fig.add_subplot( 2, 3, i+1) for i in range( 5 ) }


for i in range(5):
    ymin = min(y3[:,i])
    ymax = max(y3[:,i])
    binwidth[i] = ( ymax - ymin) / Nbins
    bins_plot = np.arange( ymin, ymax + binwidth[i], binwidth[i])
    histdata = ax[i].hist(
        y3[:,i],
        bins=bins_plot,
        label='bin '+str(i)
    )

    range_fit = np.linspace( ymin, ymax, 250)
    # Fitting and plot version 1
    popt, pcov = curve_fit(
        gaussian,
        0.5 * ( histdata[1][:-1] + histdata[1][1:] ),
        histdata[0],
        p0=( max(histdata[0]), mean_o[i], sigma_o[i] ) )
    ax[i].plot(range_fit, gaussian( range_fit, *popt ) )
    ax[i].axvline( x=mean_o[i], ls=':', c='r' )

    # Fitting and plot version 2
    params = norm.fit( y3[ ::, i ], loc=mean_o[i], scale=sigma_o[i] )
    nth = gaussian(
        range_fit,
        len( y3[::, i]) * binwidth[i] / np.sqrt( 2 * np.pi ),
        *params
    )
    ax[i].plot(range_fit, nth, ls="--" )
plt.tight_layout()
plt.show()

enter image description here

Upvotes: 0

Related Questions