droidmad
droidmad

Reputation: 837

Mixture of 1D Gaussians fit to data in Matlab / Python

I have a discrete curve y=f(x). I know the locations and amplitudes of peaks. I want to approximate the curve by fitting a gaussian at each peak. How should I go about finding the optimized gaussian parameters ? I would like to know if there is any inbuilt function which will make my task simpler.

Edit

I have fixed mean of gaussians and tried to optimize on sigma using lsqcurvefit() in matlab. MSE is less. However, I have an additional hard constraint that the value of approximate curve should be equal to the original function at the peaks. This constraint is not satisfied by my model. I am pasting current working code here. I would like to have a solution which obeys the hard constraint at peaks and approximately fits the curve at other points. The basic idea is that the approximate curve has fewer parameters but still closely resembles the original curve.

fun = @(x,xdata)myFun(x,xdata,pks,locs); %pks,locs are the peak locations and amplitudes already available
x0=w(1:6)*0.25; % my initial guess based on domain knowledge 

[sigma resnorm] = lsqcurvefit(fun,x0,xdata,ydata); %xdata and ydata are the original curve data points
recons = myFun(sigma,xdata,pks,locs);
figure;plot(ydata,'r');hold on;plot(recons);

    function f=myFun(sigma,xdata,a,c)
       % a is constant , c is mean of individual gaussians
       f=zeros(size(xdata));
       for i = 1:6 %use 6 gaussians to approximate function
          f = f + a(i) * exp(-(xdata-c(i)).^2 ./ (2*sigma(i)^2));      
       end           
    end

Upvotes: 0

Views: 1083

Answers (2)

user20160
user20160

Reputation: 1394

If you know your peak locations and amplitudes, then all you have left to do is find the width of each Gaussian. You can think of this as an optimization problem.

Say you have x and y, which are samples from the curve you want to approximate.

First, define a function g() that will construct the approximation for given values of the widths. g() takes a parameter vector sigma containing the width of each Gaussian. The locations and amplitudes of the Gaussians will be constrained to the values you already know. g() outputs the value of the sum-of-gaussians approximation at each point in x.

Now, define a loss function L(), which takes sigma as input. L(sigma) returns a scalar that measures the error--how badly the given approximation (using sigma) differs from the curve you're trying to approximate. The squared error is a common loss function for curve fitting:

L(sigma) = sum((y - g(sigma)) .^ 2)

The task now is to search over possible values of sigma, and find the choice that minimizes the error. This can be done using a variety of optimization routines.

If you have the Mathworks optimization toolbox, you can use the function lsqnonlin() (in this case you won't have to define L() yourself). The curve fitting toolbox is probably an alternative. Otherwise, you can use an open source optimization routine (check out cvxopt).

A couple things to note. You need to impose the constraint that all values in sigma are greater than zero. You can tell the optimization algorithm about this constraint. Also, you'll need to specify an initial guess for the parameters (i.e. sigma). In this case, you could probably choose something reasonable by looking at the curve in the vicinity of each peak. It may be the case (when the loss function is nonconvex) that the final solution is different, depending on the initial guess (i.e. you converge to a local minimum). There are many fancy techniques for dealing with this kind of situation, but a simple thing to do is to just try with multiple different initial guesses, and pick the best result.

Edited to add:

In python, you can use optimization routines in the scipy.optimize module, e.g. curve_fit().

Edit 2 (response to edited question):

If your Gaussians have much overlap with each other, then taking their sum may cause the height of the peaks to differ from your known values. In this case, you could take a weighted sum, and treat the weights as another parameter to optimize.

If you want the peak heights to be exactly equal to some specified values, you can enforce this constraint in the optimization problem. lsqcurvefit() won't be able to do it because it only handles bound constraints on the parameters. Take a look at fmincon().

Upvotes: 2

sajed
sajed

Reputation: 41

you can use Expectation–Maximization algorithm for fitting Mixture of Gaussians on your data. it don't care about data dimension. in documentation of MATLAB you can lookup gmdistribution.fit or fitgmdist.

Upvotes: 1

Related Questions