jerbear
jerbear

Reputation: 391

Numpy percentiles with linear interpolation - wrong value?

The linear interpolation formula for percentiles is:

linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

Suppose I have this list with 16 observations:

test = [0, 1, 5, 5, 5, 6, 6, 7, 7, 8, 11, 12, 21, 23, 23, 24]

I pass it as a numpy array and calculate the 85th percentile using linear interpolation.

np_test = np.asarray(test)
np.percentile(np_test, 85, interpolation = 'linear')

The result I get is 22.5. However, I don't think that's correct. The index of the 85th percentile is .85 * 16 = 13.6. Thus, the fractional part is .6. The 13th value is 21, so i = 21. The 14th value is 23, so j = 23. The linear formula should then yield:

21 + (23 - 21) * .6 = 21 + 2 * .6 = 21 + 1.2 = 22.2

The correct answer is 22.2. Why am I getting 22.5 instead?

Upvotes: 7

Views: 8060

Answers (1)

AGN Gazer
AGN Gazer

Reputation: 8378

len(test) is 16 but the distance between last element and first element is 1 less, that is, d=16-1=15-0=15. Therefore, index of 85th percentile is d*0.85 = 15*0.85 = 12.75. test[12] = 21 and test[13] = 23. Therefore, using linear interpolation for the fractional part, we get: 21 + 0.75 * (23 - 21) = 22.5. The correct answer is 22.5.

From the Notes section of the documentation of numpy.percentile():

Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the mimumum to the maximum in in a sorted copy of V.

The key here is, in my opinion, "the way from the minimum to the maximum". Let's say we number elements from 1 to 16. Then the "position" of the first element is 1 and the "position" (along the "coordinate axis of indices") of the last element in test is 16. Therefore the distance between them is 16-1=15.

Upvotes: 11

Related Questions