Reputation: 231
My histogram plot clearly shows two peaks. But while curve-fitting it with a double gaussian, it shows just one peak. Followed almost every answer shown in stackoverflow. But failed to get the correct result. It has previously been done by my teacher in Fortran and he got two peaks.
I used leastsq
of python's scipy.optimize
in one trial. Should I give my data also?
Here is my code.
binss = (max(x) - min(x))/0.05 #0.05 is my bin width
n, bins, patches = plt.hist(x, binss, color = 'grey') #gives the histogram
x_a = []
for item in range(len(bins)-1):
b = (bins[item]+bins[item+1])/2
x_a.append(b)
x_avg = np.array(x_a)
y_real = n
def gauss(x, A, mu, sigma):
gaus = []
for item in range(len(x)):
gaus.append(A*e**(-(x[item]-mu)**2./(2.*sigma**2)))
return np.array(gaus)
A1, A2, m1, m2, sd1, sd2 = [25, 30, 0.3, 0.6, -0.9, -0.9]
#Initial guesses for leastsq
p = [A1, A2, m1, m2, sd1, sd2]
y_init = gauss(x_avg, A1, m1, sd1) + gauss(x_avg, A2, m2, sd2) #initially guessed y
def residual(p, x, y):
A1, A2, m1, m2, sd1, sd2 = p
y_fit = gauss(x, A1, m1, sd1) + gauss(x, A2, m2, sd2)
err = y - y_fit
return err
sf = leastsq(residual, p, args = (x_avg , y_real))
y_fitted1 = gauss(x_avg, sf[0][0], sf[0][2], sf[0][4])
y_fitted2 = gauss(x_avg, sf[0][1], sf[0][3], sf[0][5])
y_fitted = y_fitted1 + y_fitted2
plt.plot(x_avg, y_init, 'b', label='Starting Guess')
plt.plot(x_avg, y_fitted, color = 'red', label = 'Fitted Data')
plt.plot(x_avg, y_fitted1, color= 'black', label = 'Fitted1 Data')
plt.plot(x_avg, y_fitted2, color = 'green', label = 'Fitted2 Data')
Even the figure I got is not smooth. It's got only 54 points in x_avg
Please do help. Can't even post the figure here.
While plotting on MATLAB, correct results were obtained. Reason: MATLAB uses Trust Region algo instead of Levenberg-Marquardt algo which was not suitable for bound constraints.
The correct results come only when this is shown as a sum of 3 individual Gaussians, not 2.
How do I get to decide which algo to use and when?
Upvotes: 2
Views: 3291
Reputation: 231
I added another gaussian term. so p
took 9 parameters in total. Thus
p = [A1, A2, A3, m1, m2, m3, sd1, sd2, sd3]
Then another term y_fitted3
was defined and added to y_fitted
. It then gave a correct figure of two peaks fitting perfectly except for the fact that the curve was not smooth at all! Then searching in stackoverflow led me to use spline
. i.e.
from scipy.interpolate import spline
and then at the end,
x_new = np.linspace(x_avg.min(),x_avg.max(),30000)
ysmooth = spline(x_avg, y_fitted, x_new)
plt.plot(x_new, ysmooth)
Then there it was.
Checking in wikipedia, it says that L-M
in python also uses T-R
. So trying again leastsq
gave the result.
But still, I am not clear about the difference shown in MATLAB
. Extra inputs will be appreciated! Thank you.
Upvotes: 2
Reputation: 6186
Your problem seems about mixtures of Gaussian
also known as Gaussian mixture model
. There are several implementations. sklearn
is worth consideration.
import numpy as np
from sklearn import mixture
import matplotlib.pyplot as plt
comp0 = np.random.randn(1000) - 5 # samples of the 1st component
comp1 = np.random.randn(1000) + 5 # samples of the 2nd component
x = np.hstack((comp0, comp1)) # merge them
gmm = mixture.GMM(n_components=2) # gmm for two components
gmm.fit(x) # train it!
linspace = np.linspace(-10, 10, 1000)
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist(x, 100) # draw samples
ax2.plot(linspace, np.exp(gmm.score_samples(linspace)[0]), 'r') # draw GMM
plt.show()
The output is
Upvotes: 7