Gaussian fit to a histogram data in python: Trust Region v/s Levenberg Marquardt

Question

My histogram plot clearly shows two peaks. But while curve-fitting it with a double gaussian, it shows just one peak. Followed almost every answer shown in stackoverflow. But failed to get the correct result. It has previously been done by my teacher in Fortran and he got two peaks. I used leastsq of python's scipy.optimize in one trial. Should I give my data also? Here is my code.

binss = (max(x) - min(x))/0.05 #0.05 is my bin width
n, bins, patches = plt.hist(x, binss, color = 'grey') #gives the histogram

x_a = []
for item in range(len(bins)-1):
    b = (bins[item]+bins[item+1])/2
    x_a.append(b)

x_avg = np.array(x_a)
y_real = n

def gauss(x, A, mu, sigma):
    gaus = []
    for item in range(len(x)):
        gaus.append(A*e**(-(x[item]-mu)**2./(2.*sigma**2)))
    return np.array(gaus)
A1, A2, m1, m2, sd1, sd2 = [25, 30, 0.3, 0.6, -0.9, -0.9]

#Initial guesses for leastsq
p = [A1, A2, m1, m2, sd1, sd2]
y_init = gauss(x_avg, A1, m1, sd1) + gauss(x_avg, A2, m2, sd2)    #initially guessed y

def residual(p, x, y):
    A1, A2, m1, m2, sd1, sd2 = p
    y_fit = gauss(x, A1, m1, sd1) + gauss(x, A2, m2, sd2)
    err = y - y_fit
    return err

sf = leastsq(residual, p, args = (x_avg , y_real))

y_fitted1 = gauss(x_avg, sf[0][0], sf[0][2], sf[0][4])
y_fitted2 = gauss(x_avg, sf[0][1], sf[0][3], sf[0][5])

y_fitted = y_fitted1 + y_fitted2

plt.plot(x_avg, y_init, 'b', label='Starting Guess')
plt.plot(x_avg, y_fitted, color = 'red', label = 'Fitted Data')
plt.plot(x_avg, y_fitted1, color= 'black', label = 'Fitted1 Data')
plt.plot(x_avg, y_fitted2, color = 'green', label = 'Fitted2 Data')

Even the figure I got is not smooth. It's got only 54 points in x_avg Please do help. Can't even post the figure here.

While plotting on MATLAB, correct results were obtained. Reason: MATLAB uses Trust Region algo instead of Levenberg-Marquardt algo which was not suitable for bound constraints.

The correct results come only when this is shown as a sum of 3 individual Gaussians, not 2.

How do I get to decide which algo to use and when?

MycrofD · Accepted Answer

I added another gaussian term. so p took 9 parameters in total. Thus

p = [A1, A2, A3, m1, m2, m3, sd1, sd2, sd3]

Then another term y_fitted3 was defined and added to y_fitted. It then gave a correct figure of two peaks fitting perfectly except for the fact that the curve was not smooth at all! Then searching in stackoverflow led me to use spline. i.e.

from scipy.interpolate import spline

and then at the end,

x_new = np.linspace(x_avg.min(),x_avg.max(),30000)
ysmooth = spline(x_avg, y_fitted, x_new)
plt.plot(x_new, ysmooth)

Then there it was. Checking in wikipedia, it says that L-M in python also uses T-R. So trying again leastsq gave the result. But still, I am not clear about the difference shown in MATLAB. Extra inputs will be appreciated! Thank you.

Gaussian fit to a histogram data in python: Trust Region v/s Levenberg Marquardt

Answers (2)

Related Questions