user2426361
user2426361

Reputation: 387

dataframe columnwise comparision to another series

It seems dataframe.le doesn't operate column wise fashion.

df = DataFrame(randn(8,12))
series=Series(rand(8))
df.le(series)

I would expect for each column in df it will compare to series (so total 12 columns comparison with series, so 12 column*8 row comparison involved). But it appears for each element in df it will compare against every elements in series so this will involves 12(columns)*8(rows) * 8(elements in series) comparison. How can I achieve column by column comparison?
Second question is once I am done with column wise comparison I want to be able to count for each row how many 'true' are there, I am currently doing astype(int32) to turn bool into int then do sum, does this sound reasonable?

Let me give an example about the first question to show what I meant (using a simpler example since show 8*12 is tough):

>>>from pandas import *  
>>>from numpy.random import *  
>>>df = DataFrame(randn(2,5))  
>>>t = DataFrame(randn(2,1))  
>>>df  
          0         1         2         3         4   
0 -0.090283  1.656517 -0.183132  0.904454  0.157861   
1  1.667520 -1.242351  0.379831  0.672118 -0.290858   
>>>t  
          0  
 0  1.291535  
 1  0.151702  
>>>df.le(t)  
       0      1      2      3      4  
0   True  False  False  False  False  
1  False  False  False  False  False  

What I expect df's column 1 should be:

1  
False  
True     

Because 1.656517 < 1.291535 is False and -1.242351 < 0.151702 is True, this is column wise comparison. However the print out is False False.

Upvotes: 0

Views: 574

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375535

I'm not sure I understand the first part of your question, but as to the second part, you can count the Trues in a boolean DataFrame using sum:

In [11]: df.le(s).sum(axis=0)
Out[11]:
0     4
1     3
2     7
3     3
4     6
5     6
6     7
7     6
8     0
9     0
10    0
11    0
dtype: int64

.

Essentially le is testing for each column:

In [21]: df[0] < s
Out[21]:
0    False
1     True
2    False
3    False
4     True
5     True
6     True
7     True
dtype: bool

Which for each index is testing:

In [22]: df[0].loc[0] < s.loc[0]
Out[22]: False

Upvotes: 1

Related Questions