np.percentile not equal to quartiles

Question

I'm trying to calculate the quartiles for an array of values in python using numpy.

X = [1, 1, 1, 3, 4, 5, 5, 7, 8, 9, 10, 1000]

I would do the following:

quartiles = np.percentile(X, range(0, 100, 25))
quartiles
# array([1.  ,  2.5 ,  5.  ,  8.25])

But this is incorrect, as the 1st and 3rd quartiles should be 2 and 8.5, respectively.

This can be shown as the following:

Q1 = np.median(X[:len(X)/2])
Q3 = np.median(X[len(X):])
Q1, Q3
# (2.0, 8.5)

I can't get my heads round what np.percentile is doing to give a different answer. Any light shed on this, I'd be very grateful for.

FLab · Accepted Answer

There is no right or wrong, but simply different ways of calculating percentiles The percentile is a well defined concept in the continuous case, less so for discrete samples: different methods would not make a difference for a very big number of observations (compared to the number of duplicates), but can actually matter for small samples and you need to figure out what makes more sense case by case.

To obtain you desired output, you should specify interpolation = 'midpoint' in the percentile function:

quartiles = np.percentile(X, range(0, 100, 25), interpolation = 'midpoint')
quartiles    # array([ 1. ,  2. ,  5. ,  8.5])

I'd suggest you to have a look at the docs http://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html

np.percentile not equal to quartiles

Answers (1)

Related Questions