alvarezcl
alvarezcl

Reputation: 579

Curve_Fit not returning expected values

I have code here that draws from two gaussian distributions with an equal number of points.

Ultimately, I want to simulate noise but I'm trying to see why if I have two gaussians with means that are really far off from each other, my curve_fit should return their average mean value. It doesn't do that.

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import gauss

N_tot = 1000
# Draw from the major gaussian. Note the number N. It is
# the main parameter in obtaining your estimators.
mean = 0; sigma = 1; var = sigma**2; N = 100
A = 1/np.sqrt((2*np.pi*var))
points = gauss.draw_1dGauss(mean,var,N)

# Now draw from a minor gaussian. Note Np
meanp = 10; sigmap = 1; varp = sigmap**2; Np = N_tot-N
pointsp = gauss.draw_1dGauss(meanp,varp,Np)
Ap = 1/np.sqrt((2*np.pi*varp))      

# Now implement the sum of the draws by concatenating the two arrays.
points_tot = np.array(points.tolist()+pointsp.tolist())
bins_tot = len(points_tot)/5
hist_tot, bin_edges_tot = np.histogram(points_tot,bins_tot,density=True)
bin_centres_tot = (bin_edges_tot[:-1] + bin_edges_tot[1:])/2.0

# Initial guess
p0 = [A, mean, sigma]

# Result of the fit
coeff, var_matrix = curve_fit(gauss.gaussFun, bin_centres_tot, hist_tot, p0=p0)

# Get the fitted curve
hist_fit = gauss.gaussFun(bin_centres, *coeff)
plt.figure(5); plt.title('Gaussian Estimate')
plt.suptitle('Gaussian Parameters: Mu = '+ str(coeff[1]) +' , Sigma = ' + str(coeff[2]) + ', Amplitude = ' + str(coeff[0]))
plt.plot(bin_centres,hist_fit)
plt.draw()        

# Error on the estimates
error_parameters = np.sqrt(np.array([var_matrix[0][0],var_matrix[1][1],var_matrix[2][2]]))

The returned parameters are still centered about 0 and I'm not sure why. It should be centered around 10.

Edit: Changed the integer division portions but still not returning good fit value. I should get a mean of about ~10 since most of my points are being drawn from that distribution (i.e. the minor distribution)

Upvotes: 0

Views: 301

Answers (1)

wltrimbl
wltrimbl

Reputation: 116

You find that the least-squares optimization converges to the larger of the two peaks.

The least-squares optimum does not find the "average mean value" of two component distributions, it algorithm merely minimizes the squared error. This will usually happen when the biggest peak is fit.

When the distribution is this lopsided (90% of the samples are from the larger of the two peaks) the error terms on the main peak destroy the local minima at the smaller peak and the minimum between the peaks.

You can get the fit to converge to a point in the center only when the peaks are nearly equal in size, otherwise you should expect least-squares to find the "strongest" peak if it doesn't get stuck in a local minimum.

With the following pieces, I can run your code:

bin_centres = bin_centres_tot

def draw_1dGauss(mean,var,N):
    from scipy.stats import norm
    from numpy import sqrt
    return scipy.stats.norm.rvs(loc = mean, scale = sqrt(var), size=N)

def gaussFun(bin_centres, *coeff):
    from numpy import sqrt, exp, pi
    A, mean, sigma = coeff[0], coeff[1], coeff[2]
    return exp(-(bin_centres-mean)**2 / 2. / sigma**2 ) / sigma / sqrt(2*pi)

plt.hist(points_tot, normed=True, bins=40)

Upvotes: 1

Related Questions