S.zhen
S.zhen

Reputation: 359

np.percentile not equal to quartiles

I'm trying to calculate the quartiles for an array of values in python using numpy.

X = [1, 1, 1, 3, 4, 5, 5, 7, 8, 9, 10, 1000]

I would do the following:

quartiles = np.percentile(X, range(0, 100, 25))
quartiles
# array([1.  ,  2.5 ,  5.  ,  8.25])

But this is incorrect, as the 1st and 3rd quartiles should be 2 and 8.5, respectively.

This can be shown as the following:

Q1 = np.median(X[:len(X)/2])
Q3 = np.median(X[len(X):])
Q1, Q3
# (2.0, 8.5)

I can't get my heads round what np.percentile is doing to give a different answer. Any light shed on this, I'd be very grateful for.

Upvotes: 1

Views: 6113

Answers (1)

FLab
FLab

Reputation: 7476

There is no right or wrong, but simply different ways of calculating percentiles The percentile is a well defined concept in the continuous case, less so for discrete samples: different methods would not make a difference for a very big number of observations (compared to the number of duplicates), but can actually matter for small samples and you need to figure out what makes more sense case by case.

To obtain you desired output, you should specify interpolation = 'midpoint' in the percentile function:

quartiles = np.percentile(X, range(0, 100, 25), interpolation = 'midpoint')
quartiles    # array([ 1. ,  2. ,  5. ,  8.5])

I'd suggest you to have a look at the docs http://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html

Upvotes: 3

Related Questions