Matthew Gunn
Matthew Gunn

Reputation: 4519

Calculate percentiles? (Or more generally, evaluate function implicitly defined by 2 vectors x and y at many values z)

Let's say you have some vector z and you compute [f, x] = ecdf(z);, hence your empirical CDF can be plotted with stairs(x, f).

Is there a simple way to compute what all the percentile scores are for z?

I could do something like:

It feels like there should be a simpler, already implemented way to do this...

Upvotes: 1

Views: 280

Answers (1)

Luis Mendo
Luis Mendo

Reputation: 112689

Let f be a monotone function defined at values x, for which you want to compute the inverse function at values p. In your case f is monotone because it is a CDF; and the values p define the desired quantiles. Then you can simply use interp1 to interpolate x, considered as a function of f, at values p:

z = randn(1,1e5); % example data: normalized Gaussian distribution
[f, x] = ecdf(z); % compute empirical CDF
p = [0.5 0.9 0.95]; % desired values for quantiles
result = interp1(f, x, p);

In an example run of the above code, this produces

result =
   0.001706069265714   1.285514249607186   1.647546848952448

For the specific case of computing quantiles p from data z, you can directly use quantile and thus avoid computing the empirical CDF:

result = quantile(z, p)

The results may be slightly different depending on how the empirical CDF has been computed in the first method:

>> quantile(z, p)
ans =
   0.001706803588857   1.285515826972878   1.647582486507752

For comparison, the theoretical values for the above example (Gaussian distribution) are

>> norminv(p)
ans =
                   0   1.281551565544601   1.644853626951472

Upvotes: 3

Related Questions