Luis Cruz
Luis Cruz

Reputation: 1644

Fitting an empirical CDF curve to find exact vaue

I am trying to find the exact value of any number using the empirical cdf. What's the best way to get the exact value? Can I use a fitting tool and then estimate it using the fitted function?

[f,x] = ecdf(samples);

i.e How do I find the best function that fits my empirical CDF to get the exact CDF of any number I want?

These are my samples:

enter image description here

Upvotes: 0

Views: 4270

Answers (1)

Mark Mikofski
Mark Mikofski

Reputation: 20208

You can get an approximate value of f(x) by finding the shape (σ) and location (μ) parameters that best fit the curve in a least squares sense.

Here's an "example" set of noisy "test data" with a normal distribution (similar to your sampled data):

>> % ytest = f(xtest, mutest, sigtest)  % sample test data
>> xtest = linspace(-10, 10, 100);  % independent variable linearly spaced
>> mutest = rand(1, 1) - 0.5;  % random location parameter
>> sigtest = 1 + rand(1, 1);  % random shape parameter
>> ytest = normcdf(xtest, mutest, sigtest) + rand(1, 100) / 10;  % distribution
mutest =
    0.2803
sigtest =
    1.6518

Now you can use fminsearch to find the shape and location parameters assuming a normal distribution. We need to provide an objective function which we want fminsearch to minimize so we create an an anonymous function that is the norm of the residuals between the ideal normal cumulative distribution function and the test data. The function has 2-dimensions, [μ, σ] which we pass as a vector. We also need to provide fminsearch with an initial guess.

>> % objective function with normal distribution
>> % mu(1) = location parameter (mean)
>> % mu(2) = shape parameter (standard deviation)
>> obj_func = @(mu)norm(normcdf(xtest, mu(1), mu(2)) - ytest)
>> mu0 = [0, 1];  % initial guesses for mean and stdev
>> mu = fminsearch(obj_func, mu0);
>> sigma = mu(2);  % best fit standard deviation
>> mu = mu(1)  % best fit mean
mu =
   -0.0386
sigma
    1.7399

Now you can predict any CDF in your empirical data using x, μ and σ using the normcdf function

>> y = normcdf(xtest, mu, sigma);

fitting normal distribution

MATLAB offers many types of probability distributions. If you don't know what type of distribution your data has, and your population has only positive values, then one possible PDF is a Weibull, which has a flexible 3 parameter form: shape, scale, and location. See "Estimate parameters of 3-parameter Weibull" on MATLAB. Then just replace normcdf with wblcdf.

>> xtest = linspace(0, 10, 100);
>> mutest = rand(1, 1) - 0.5; % location
>> mutest
mutest = -0.35813
>> sigtest = 1 + rand(1, 2); % shape and scale
>> sigtest
sigtest =
   1.6441   1.3324
>> ytest = wblcdf(xtest-mutest, sigtest(1), sigtest(2)) + rand(1, 100) / 10;
>> % objective function with Weibull distribution
>> % mu(1) = location parameter (mean)
>> % mu(2) = scale parameter (standard deviation)
>> % mu(3) = shape parameter
>> obj_func = @(mu)norm(wblcdf(xtest-mu(1), mu(2), mu(3)) - ytest)
>> mu0 = [0, 1, 1];  % initial guesses for mean and stdev
>> mu = fminsearch(obj_func, mu0);
>> mu
mu =
  -0.85695   1.94229   1.89319
>> shape = mu(3);  % best fit shape
>> sigma = mu(2);  % best fit standard deviation
>> mu = mu(1)  % best fit mean
>> y = wblcdf(xtest-mu, sigma, shape);

fitting weibull distribution

Upvotes: 4

Related Questions