jjunk
jjunk

Reputation: 97

How can I interpret the pandas quartiles?

I have a pandas datafram df with a column A. The values of A are based on predictions and I've forced them to be greater or equal to 0.00000001. Now when I run df.A.describe() I get:

count    3.900000e+02
mean     1.047049e-05
std      7.774749e-05
min      1.000000e-08
25%      1.000000e-08
50%      1.000000e-08
75%      1.000000e-08
max      1.008428e-03+

The way I understand it, this means that at least 75% of my values for A are equal to 0.0000001. However, when I run x = len(df.loc[df['A'] == 0.00000001]) I get x = 207 and 207/390 < 0.75. Shouldn't I get a value for x that is greater than 292 (390*0.75 = 292.5)?

Upvotes: 2

Views: 168

Answers (1)

jjunk
jjunk

Reputation: 97

For anyone who might be running into a similar problem, I've found the answer:
There are only 207 values in my df with df.A == 0.00000001. However there are also some values which are just marginally bigger (e.g. maybe df.A == 0.0000000100000000001). Hence, even though those values are not exactly equal to 0.00000001, when I print the df or ask for df.A.describe() they are shown as 0.00000001, since the difference is so small.

Upvotes: 2

Related Questions