Ashish
Ashish

Reputation: 739

Fitting two Gaussians on less expressed bimodal data

I am trying to fit two Gaussians on a bimodal distribution data, but most of the optimisers give me wrong results always based on starting guess as below

I also tried GMM from scikit-learn, which didn't help much. I am wondering what I may be doing wrong and what is better approach so that we can test and fit the bimodal data. One of the example code using curve_fit and data is as follows

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def gauss(x,mu,sigma,A):
    return A*np.exp(-(x-mu)**2/2/sigma**2)

def bimodal(x,mu1,sigma1,A1,mu2,sigma2,A2):
    return gauss(x,mu1,sigma1,A1)+gauss(x,mu2,sigma2,A2)

def rmse(p0):
    mu1,sigma1,A1,mu2,sigma2,A2 =p0
    y_sim = bimodal(x,mu1,sigma1,A1,mu2,sigma2,A2)
    rms = np.sqrt((y-y_sim)**2/len(y))

data = pd.read_csv('data.csv')
x, y = data.index, data['24hr'].values

expected=(400,720,500,700,774,150)

params,cov=curve_fit(bimodal,x,y,expected, maxfev=100000)
sigma=np.sqrt(np.diag(cov))
plt.plot(x,bimodal(x,*params),color='red',lw=3,label='model')
plt.plot(x,y,label='data')
plt.legend()
print(params,'\n',sigma)

Upvotes: 0

Views: 3055

Answers (1)

mikuszefski
mikuszefski

Reputation: 4043

You could try a skewed Gaussian. With the parameter alpha->0 this becomes a normal Gaussian, allowing somewhat for comparison:

import matplotlib.pyplot as plt
import numpy as np
from scipy.special import erf
from scipy.optimize import minimize,leastsq, curve_fit


def gauss(x):
    return np.exp( -0.5 * x**2 / np.sqrt( 2 * np.pi ) )


def Phi(x):
    return ( 0.5 * ( 1. + erf(x/np.sqrt(2) ) ) )


def skewed(x, x0, s, a):
    return 2./s * gauss( ( x - x0 ) / s ) * Phi( a * ( x - x0 ) / s)


def my_double_peak(x, A0, x0, s0, a, A1, x1, s1):
    return A0 * skewed( x, x0, s0, a ) + A1 / s1 * gauss( ( x - x1 ) / s1 )

data = np.loadtxt("data.csv", skiprows=1, delimiter=',')
xData = range(len(data))

fitResult, ier = curve_fit( my_double_peak, xData, data[:,1], p0=(45e3, 400., 60,4. ,15e3, 700., 30 )  ) 
print fitResult
bestFit = [my_double_peak(x, *fitResult ) for x in range(len(data)) ]


fig1=plt.figure(1)
ax= fig1.add_subplot( 1, 1, 1 )
ax.plot( data[:,1] )
ax.plot( bestFit )

plt.show()

Providing:

>>> [  6.77971459e+04   3.48661227e+02   8.60938473e+01   
       8.43422033e+00   3.86660495e+03   7.22528635e+02   
       2.49055201e+01]

Fit with skewed gaussian

Upvotes: 6

Related Questions