A H
A H

Reputation: 2570

Find if a specific (max) value is repeated in a row

For a dataframe as follows:

df = pd.DataFrame([(1, 2, 3, 4, 0, 1, 2, 3, 4),
              (1, 2, 0, 1, 2, 3, 4, 5, 6),
              (1, 2, 3, 0, 1, 2, 3, 0, 1)],
            columns=['P'+str(i+1) for i in range(9)],
            index = ['row1', 'row2', 'row3'])

Resultant df:

        P1  P2  P3  P4  P5  P6  P7  P8  P9
row1    1   2   3   4   0   1   2   3   4
row2    1   2   0   1   2   3   4   5   6
row3    1   2   3   0   1   2   3   0   1

I want to find if a row has the max value occuring more than once, e.g.

 df.max(axis=1)
 >>> row1    4
     row2    6
 >>> row3    3

With row1 and row3 having their max value repeated.

Ideally, the solution is vectorised as I have 40,000 rows and 50 columns.

Upvotes: 2

Views: 54

Answers (1)

jezrael
jezrael

Reputation: 862661

Use eq for comapring and count Trues by sum, last filter indices by boolean indexing:

a = df.eq(df.max(axis=1),axis=0).sum(axis=1)
print (a)
row1    2
row2    1
row3    2
dtype: int64

b = a.index[a > 1]
print (b)
Index(['row1', 'row3'], dtype='object')

Detail:

print (df.eq(df.max(axis=1),axis=0))
         P1     P2     P3     P4     P5     P6     P7     P8     P9
row1  False  False  False   True  False  False  False  False   True
row2  False  False  False  False  False  False  False  False   True
row3  False  False   True  False  False  False   True  False  False

Numpy alternative:

arr = df.values
c = (arr == arr.max(axis=1)[:, None]).sum(axis=1)
print (c)
[2 1 2]

a = pd.Series(c, index=df.index)
print (a)
row1    2
row2    1
row3    2
dtype: int32

Upvotes: 3

Related Questions