Reputation: 1475
I need to fit a multimodal distribution that represent particle size measurements. These measurements could for example look like this:
Now I would like to fit these curves. With the help of this answer I was able to get quite decent results for a unimodal distribution function:
fun = @(p,x)(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2)));
by scaling the resulting parameters like this:
[yM_in, pp_in] = max(DPF_in_mean);
xM_in = x_data(pp_in);
[yM_out, pp_out] = max(DPF_out_mean);
xM_out = x_data(pp_out);
xR_in = x_data / xM_in;
yR_in = DPF_in_mean / yM_in;
xR_out = x_data / xM_out;
yR_out = DPF_out_mean / yM_out;
opts = optimoptions('lsqcurvefit','TolX',1e-4,'TolFun',1e-8);
p0 = [1,1,1];
p_in = lsqcurvefit(fun,p0,xR_in,yR_in,[],[],opts);
p_out = lsqcurvefit(fun,p0,xR_out,yR_out,[],[],opts);
p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
p_out_scaled = [ yM_out * p_out(1) * xM_out, p_out(2) + log(xM_out), p_out(3) ];
However if I plot the resulting fit it is obvious that a unimodal distribution is not sufficient to fit the measurements:
In the Wikipedia article about multimodal distribution it seems I could just blend a second distribution in like this:
fun = @(p,x)(p(7)*(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2))) + (1 - p(7))*(p(4)./x .* 1./(p(5)*sqrt(2*pi)).*exp(-(log(x)-p(6)).^2./(2*p(5)^2))));
However I don't know how I need to integrate the additional parameters in the scaling
p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
because I didn't really understand what is happening in this scaling step.
How can I use a multimodal distribution to fit my measurements?
EDIT
The used data is the following:
x_data = [4.87000000000000e-09 5.62000000000000e-09 6.49000000000000e-09 7.50000000000000e-09 8.66000000000000e-09 ...
1.00000000000000e-08 1.15500000000000e-08 1.33400000000000e-08 1.54000000000000e-08 1.77800000000000e-08 ...
2.05400000000000e-08 2.37100000000000e-08 2.73800000000000e-08 3.16200000000000e-08 3.65200000000000e-08 ...
4.21700000000000e-08 4.87000000000000e-08 5.62300000000000e-08 6.49400000000000e-08 7.49900000000000e-08 ...
8.66000000000000e-08 1.00000000000000e-07 1.15480000000000e-07 1.33350000000000e-07 1.53990000000000e-07 ...
1.77830000000000e-07 2.05350000000000e-07 2.37140000000000e-07 2.73840000000000e-07 3.16230000000000e-07 ...
3.65170000000000e-07 4.21700000000000e-07 4.86970000000000e-07 5.62340000000000e-07 6.49380000000000e-07 ...
7.49890000000000e-07 8.65960000000000e-07 1.00000000000000e-06];
DPF_in_mean = [188318640795.745 360952462222.222 750859638450.704 2226776878843.93 4845941940346.82 7979258430057.80 ...
11010887350289.0 13462058712138.7 15090350247398.8 15991756383815.0 16680978441618.5 17862081914450.9 ...
20071390890173.4 23460963364161.9 27630428508670.5 31777265780346.8 35520451433526.0 38587652184971.1 ...
40516972485549.1 41326812092485.6 41127130682080.9 40038712485549.1 37976259664739.9 34725415132948.0 ...
30177578265896.0 24546703179190.8 18400851109826.6 12500471611560.7 7540309609248.56 3912091102658.96 ...
1632974141040.46 458500289086.705 126012891030.303 0 0 0 7276263267.44526 11203995842.0392];
DPF_out_mean = [444898373533.333 1032357396444.44 1675044380444.44 2316141430222.22 2852971589555.56 3151959865111.11 ...
3134892475777.78 2828026308000.00 2325761940666.67 1745907627777.78 1192912799111.11 742253282222.222 ...
430349362888.889 255820144555.556 188235813444.444 181970493622.222 204829338533.333 233009821977.778 ...
243007623333.333 230736732777.778 202426609488.889 169758857200.000 140604138622.222 116482776222.222 ...
95076737155.5556 74172071777.7778 53672033733.3333 35251323911.1111 20813708255.5556 11102006362.8889 ...
5497173092.96089 2625918349.76536 1471042995.80373 1012939492.96541 751738952.194595 589422111.731818 ...
479373451.936508 378359645.767442];
Upvotes: 1
Views: 745
Reputation: 4647
Here is one possibility that might be of some use. Because the large peaks in the data obscure the smaller peaks, subtracting the larger peak data leaves the small peak data isolated for analysis. If you know the form of the large peak, you can fit the data to that and then have only one of the two bimodal peaks remaining for analysis in each data set. Once you find the form of the second peak, you can the start over by fitting the sum of the two using the previous analysis fitted values as initial parameter values for the final analysis.
I have performed an equation search on both data sets to find a suitable peak equation for the main peaks in each set, here are my results. No data transformation or preconditioning was done and I used the raw data as posted.
For DPF_in_mean I have the main peak as:
def Peak_LogNormalAShifted_model(x_in): # from zunzun.com using DPF_in_mean
# coefficients
a = 4.2518873952455000E+13
b = -1.6088042345890798E+01
c = 6.0084502637701809E-01
d = -2.3091703726006359E-08
return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in-d)-b) / c, 2.0))
For DPF_out_mean I have the main peak as:
def Peak_LogNormalA_model(x_in): # from zunzun.com using DPF_out_mean
# coefficients
a = 3.1863877879345913E+12
b = -1.8334716040160675E+01
c = 4.4913908739937525E-01
return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in)-b) / c, 2.0))
Upvotes: 1