Reputation: 2570
For a dataframe as follows:
df = pd.DataFrame([(1, 2, 3, 4, 0, 1, 2, 3, 4),
(1, 2, 0, 1, 2, 3, 4, 5, 6),
(1, 2, 3, 0, 1, 2, 3, 0, 1)],
columns=['P'+str(i+1) for i in range(9)],
index = ['row1', 'row2', 'row3'])
Resultant df:
P1 P2 P3 P4 P5 P6 P7 P8 P9
row1 1 2 3 4 0 1 2 3 4
row2 1 2 0 1 2 3 4 5 6
row3 1 2 3 0 1 2 3 0 1
I want to find if a row has the max value occuring more than once, e.g.
df.max(axis=1)
>>> row1 4
row2 6
>>> row3 3
With row1
and row3
having their max value repeated.
Ideally, the solution is vectorised as I have 40,000 rows and 50 columns.
Upvotes: 2
Views: 54
Reputation: 862661
Use eq
for comapring and count True
s by sum
, last filter indices by boolean indexing
:
a = df.eq(df.max(axis=1),axis=0).sum(axis=1)
print (a)
row1 2
row2 1
row3 2
dtype: int64
b = a.index[a > 1]
print (b)
Index(['row1', 'row3'], dtype='object')
Detail:
print (df.eq(df.max(axis=1),axis=0))
P1 P2 P3 P4 P5 P6 P7 P8 P9
row1 False False False True False False False False True
row2 False False False False False False False False True
row3 False False True False False False True False False
Numpy alternative:
arr = df.values
c = (arr == arr.max(axis=1)[:, None]).sum(axis=1)
print (c)
[2 1 2]
a = pd.Series(c, index=df.index)
print (a)
row1 2
row2 1
row3 2
dtype: int32
Upvotes: 3