Moritz
Moritz

Reputation: 5408

calculate percentile of 2D array

i have size classes and for each size class i have measured counts:

import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm
size_class = np.linspace(0,9,10)
counts = norm.pdf(size_class, 5,1) # synthetic data
counts_cumulative_normalised = np.cumsum(counts)/counts.sum() # summing up and normalisation
plt.plot(size_class,counts_cumulative_normalised)
plt.show()

so if i would like to calculate the percentiles of the size i would have to interpolate my desired size.

Is there a build in function that takes these two vectors as arguments and gives me the desired percentiles ?

Upvotes: 0

Views: 2134

Answers (1)

CT Zhu
CT Zhu

Reputation: 54340

If you don't know if the data is normally distributed, and you want to get the percentiles based on the Empirical Cumulative Distribution Function, you can use a interpolation approach.

In [63]:

plt.plot(size_class,counts_cumulative_normalised)
Out[63]:
[<matplotlib.lines.Line2D at 0x10c72d3d0>]

enter image description here

In [69]:
#what percentile does size 4 correspond to ?
from scipy import interpolate
intp=interpolate.interp1d(size_class, counts_cumulative_normalised, kind='cubic')
intp(4)
Out[69]:
array(0.300529305241782)

I know you are presenting just a synthetic data, but do notice that the way you are doing underestimated the Cumulative Distribution Functions, as you only takes a few sample points, see this comparison:

plt.plot(size_class,counts_cumulative_normalised)
plt.plot(size_class,norm.cdf(size_class, 5, 1))

enter image description here

Upvotes: 1

Related Questions