shin
shin

Reputation: 32721

Different Q1 and Q3 values in python calculation from TI-nspire

I calculated the upper quartile (Q3 or 75%-tile) and lower quartile (Q1 or 25%-tile) using Numpy/Pandas and TI-nspire. But I get different values. Why does this happen?

From (5+8)/2=6.5 and (18+21)/2=19.5, Numpy/Pandas Q1 and Q3 are wrong. Why does Numpy/Pandas return wrong numbers?

import numpy as np

data=np.array([2,4,5,8,10,11,12,14,17,18,21,22,25])

q75, q25 = np.percentile(data, [75 ,25])
print(q75,q25)


df=pd.DataFrame(data)
df.describe()

Numpy returns 18.0 and 8.0. Pandas return 18.0 and 8.0. But TI-nspire returns 19.5 and 6.5.

enter image description here

enter image description here

Upvotes: 1

Views: 755

Answers (2)

soegaard
soegaard

Reputation: 31145

You are in for a treat. They are both right.

Unlike most other descriptors there are are several different definitions of Q1 and Q3 in use. For dataset with a large number of observations the different definitions will give the more-or-less the same result. For small datasets you will see differences - as you experienced.

Mathword lists 5 (five!) different ways of computing quartiles. See http://mathworld.wolfram.com/Quartile.html

Upvotes: 1

shin
shin

Reputation: 32721

This post and this post helped me understand it.

So if you have [7, 15, 36, 39, 40, 41], then 7 -> 0%, 15 -> 20%, 36 -> 40%, 39 -> 60%, 40 -> 80%, 41 -> 100%.

The default of interpolation is linear. So it uses i + (j - i) * fraction. You can set interpolation to midpoint which calculate (i + j) / 2.

import numpy as np

data=np.array([7,15,36,39,40,41])
linear = np.percentile(data, [25, 50, 75], interpolation='linear')
mid = np.percentile(data, [25, 50, 75], interpolation='midpoint')
low = np.percentile(data, [25, 50, 75], interpolation='lower')
high = np.percentile(data, [25, 50, 75], interpolation='higher')
nearest = np.percentile(data, [25, 50, 75], interpolation='nearest')
print(linear,mid,low,high,nearest)
print(15,37.5,40)

Output:

enter image description here

So I found there is no exact way you find the Q1 and Q3 in Pandas/Numpy as TI-nspire.

Upvotes: 1

Related Questions