Find if a specific (max) value is repeated in a row

Question

For a dataframe as follows:

df = pd.DataFrame([(1, 2, 3, 4, 0, 1, 2, 3, 4),
              (1, 2, 0, 1, 2, 3, 4, 5, 6),
              (1, 2, 3, 0, 1, 2, 3, 0, 1)],
            columns=['P'+str(i+1) for i in range(9)],
            index = ['row1', 'row2', 'row3'])

Resultant df:

        P1  P2  P3  P4  P5  P6  P7  P8  P9
row1    1   2   3   4   0   1   2   3   4
row2    1   2   0   1   2   3   4   5   6
row3    1   2   3   0   1   2   3   0   1

I want to find if a row has the max value occuring more than once, e.g.

 df.max(axis=1)
 >>> row1    4
     row2    6
 >>> row3    3

With row1 and row3 having their max value repeated.

Ideally, the solution is vectorised as I have 40,000 rows and 50 columns.

jezrael · Accepted Answer

Use eq for comapring and count Trues by sum, last filter indices by boolean indexing:

a = df.eq(df.max(axis=1),axis=0).sum(axis=1)
print (a)
row1    2
row2    1
row3    2
dtype: int64

b = a.index[a > 1]
print (b)
Index(['row1', 'row3'], dtype='object')

Detail:

print (df.eq(df.max(axis=1),axis=0))
         P1     P2     P3     P4     P5     P6     P7     P8     P9
row1  False  False  False   True  False  False  False  False   True
row2  False  False  False  False  False  False  False  False   True
row3  False  False   True  False  False  False   True  False  False

Numpy alternative:

arr = df.values
c = (arr == arr.max(axis=1)[:, None]).sum(axis=1)
print (c)
[2 1 2]

a = pd.Series(c, index=df.index)
print (a)
row1    2
row2    1
row3    2
dtype: int32

Find if a specific (max) value is repeated in a row

Answers (1)

Related Questions