Reputation: 387
It seems dataframe.le doesn't operate column wise fashion.
df = DataFrame(randn(8,12))
series=Series(rand(8))
df.le(series)
I would expect for each column in df
it will compare to series
(so total 12 columns comparison with series
, so 12 column*8 row comparison involved). But it appears for each element in df
it will compare against every elements in series
so this will involves 12(columns)*8(rows) * 8(elements in series) comparison. How can I achieve column by column comparison?
Second question is once I am done with column wise comparison I want to be able to count for each row how many 'true' are there, I am currently doing astype(int32)
to turn bool into int then do sum
, does this sound reasonable?
Let me give an example about the first question to show what I meant (using a simpler example since show 8*12 is tough):
>>>from pandas import *
>>>from numpy.random import *
>>>df = DataFrame(randn(2,5))
>>>t = DataFrame(randn(2,1))
>>>df
0 1 2 3 4
0 -0.090283 1.656517 -0.183132 0.904454 0.157861
1 1.667520 -1.242351 0.379831 0.672118 -0.290858
>>>t
0
0 1.291535
1 0.151702
>>>df.le(t)
0 1 2 3 4
0 True False False False False
1 False False False False False
What I expect df's column 1 should be:
1
False
True
Because 1.656517 < 1.291535
is False
and -1.242351 < 0.151702
is True
, this is column wise comparison. However the print out is False False
.
Upvotes: 0
Views: 574
Reputation: 375535
I'm not sure I understand the first part of your question, but as to the second part, you can count the True
s in a boolean DataFrame using sum
:
In [11]: df.le(s).sum(axis=0)
Out[11]:
0 4
1 3
2 7
3 3
4 6
5 6
6 7
7 6
8 0
9 0
10 0
11 0
dtype: int64
.
Essentially le
is testing for each column:
In [21]: df[0] < s
Out[21]:
0 False
1 True
2 False
3 False
4 True
5 True
6 True
7 True
dtype: bool
Which for each index is testing:
In [22]: df[0].loc[0] < s.loc[0]
Out[22]: False
Upvotes: 1