aviss
aviss

Reputation: 2439

Pandas: find maximum value across all columns and print this row

I have a large dataframe with normalised and scaled data which should be in a range 0-1. But when I print its maximum values I get - 1.000000002. describe() method doesn't show this value. So I'm trying to identify the problem and want to print a row in question. All other answers I got across are talking about printing a row with a maximum value of a certain column. How can I print a row which contains a maximum value for the whole dataframe? Will appreciate your help!

test = pd.DataFrame({'att1'  : [0.1, 0.001, 0.0001,
                            1, 2,
                            0.5, 0, -1, -2],
                   'att2':[0.01, 0.0001, 0.00001,
                            1.1, 2.2,
                            2.37, 0, -1.5, -2.5]})
test.max().max()
Out: 2.37000

Desirable result:

    att1    att2
5   0.5     2.37

UPD: I updated the test dataframe as it caused confusions (my fault!). I need to print one row which contains the max value for the whole dataframe.

Upvotes: 1

Views: 2008

Answers (3)

Andy L.
Andy L.

Reputation: 25239

Edit:
After OP further explanation, I think comparing values array to values.max() is more flexible as follows:

test[test.values == test.values.max()]

It returns the row of max_value of dataframe. In case of att1_max is same as att2_max but on different rows, it returns both rows. In this case, if single row is preferable, add head(1) to it.

att1_max and att2_max on same row:

Out[660]:
     att1     att2
0  0.1000  0.01000
1  0.0010  0.00010
2  0.0001  0.00001
3  1.0000  1.10000
4  2.0000  2.20000
5  2.3000  2.37000
6  0.0000  0.00000
7 -1.0000 -1.50000
8 -2.0000 -2.50000

In [661]: test[test.values == test.values.max()]
Out[661]:
   att1  att2
5   2.3  2.37

att1_max and att2_max on different rows:

Out[664]:
     att1     att2
0  0.1000  0.01000
1  0.0010  0.00010
2  0.0001  0.00001
3  1.0000  1.10000
4  2.0000  2.20000
5  2.3000  1.37000
6  0.0000  0.00000
7 -1.0000 -1.50000
8 -2.0000 -2.50000

In [665]: test[test.values == test.values.max()]
Out[665]:
   att1  att2
5   2.3  1.37

att1_max is same as att2_max but on different rows(this case stack only returns 1 row while this returns both rows)

Out[668]:
      att1      att2
0   0.1000   0.01000
1  25.0500   0.00010
2   0.0001   0.00001
3   1.0000   1.10000
4   2.0000   2.20000
5   2.3000   1.37000
6   0.0000   0.00000
7  -1.0000  25.05000
8  -2.0000  -2.50000

In [669]: test[test.values == test.values.max()]
Out[669]:
    att1     att2
1  25.05   0.0001
7  -1.00  25.0500

Note: in the last case, if returning single is required, just add head(1)

In [670]: test[test.values == test.values.max()].head(1)
Out[670]:
    att1    att2
1  25.05  0.0001    

Note 2: if att1_max and att2_max is the same and on same row, that row will show twice. In that case using drop_duplicates() to handle it.

Original:

@Wen-Ben answer is good, but I think using stack here is unneccessary. I prefer idxmax and drop_duplicates:

test.iloc[test.idxmax()].drop_duplicates()    

or

test.loc[test.idxmax().drop_duplicates()]

att1_max and att2_max on same row:

In [510]: test.iloc[test.idxmax()].drop_duplicates()
Out[510]:
   att1  att2
5   2.3  2.37

att1_max and att2_max on different rows:

In [513]: test.iloc[test.idxmax()].drop_duplicates()
Out[513]:
   att1  att2
5   2.3  1.37
4   2.0  2.20

So, att1_max and att2_max are on same row, return exact 1 row. att1_max and att2_max are on different rows, return 2 rows where att1_max and att2_max exists.

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153460

Let's use np.where which returns row and column index:

r, _ = np.where(test.values == np.max(test.values))
test.iloc[r]

Output:

   att1  att2
5   2.3  2.37

Upvotes: 0

BENY
BENY

Reputation: 323226

I am using idxmax here after stack

test.iloc[[test.stack().idxmax()[0]]]
Out[154]: 
   att1  att2
5   2.3  2.37

Upvotes: 5

Related Questions