Reputation: 8683
Any idea on why such a behavior?
Base data:
In [1]: tmc_sum.head(6)
Out [1]: 1 2 3 8 9 10
tmc
110+05759 7469 7243 7307 7347 7271 7132
110P05759 7730 7432 7482 7559 7464 7305
110+05095 7256 6784 6697 6646 6786 6530
110P05095 0 0 0 0 0 0
110+05096 6810 5226 5625 5035 5064 4734
110P05096 6854 5041 5600 5308 5261 4747
Prelude:
As per the documentation of quantile, this works correctly:
In [2]: tmc_sum.quantile(0.05, axis=1)
Out [2]: 1 3347.50
2 1882.40
3 1933.10
8 1755.00
9 1554.15
10 1747.85
dtype: float64
It correctly computes the 5th percentile by columns. (Note that there are more columns than the six printed above.)
Problem:
But this doesn't work as expected:
In [3]: tmc_sum.quantile(0.05, axis=0)
Out [3]: 1 3347.50
2 1882.40
3 1933.10
8 1755.00
9 1554.15
10 1747.85
dtype: float64
Which again computes by the column. Although, as per the documentation, it should compute by the row. So I tend to expect something like this:
In [4]: tmc_sum.apply(lambda x: np.percentile(x, 0.05), axis=1).head(6)
Out [4]: tmc
110+05759 7132.2775
110P05759 7305.3175
110+05095 6530.2900
110P05095 0.0000
110+05096 4734.7525
110P05096 4747.7350
Is this behavior expected and am I missing something, or is it a bug?
Upvotes: 1
Views: 2671
Reputation: 139172
This was a bug in 0.14.0 (axis keyword was ignored) and is fixed in 0.14.1 (see https://github.com/pydata/pandas/pull/7312)
If you can't upgrade, you can get the desired behaviour with df.T.quantile(0.5)
.
BTW, it is the axis=1
case that is not correctly. The default value of axis=0
computes the quantiles for the different columns, axis=1
computes it 'along the columns' for each row. Small example, consider:
In [3]: df
Out[3]:
a b c
0 0 1 2
1 3 4 5
The default value of axis=0
:
In [4]: df.quantile(0.5, axis=0)
Out[4]:
a 1.5
b 2.5
c 3.5
dtype: float64
And with axis=1
:
In [5]: df.quantile(0.5, axis=1)
Out[5]:
0 1
1 4
dtype: float64
Upvotes: 5