Reputation: 2439
I have a large dataframe with normalised and scaled data which should be in a range 0-1. But when I print its maximum values I get - 1.000000002. describe()
method doesn't show this value. So I'm trying to identify the problem and want to print a row in question. All other answers I got across are talking about printing a row with a maximum value of a certain column. How can I print a row which contains a maximum value for the whole dataframe? Will appreciate your help!
test = pd.DataFrame({'att1' : [0.1, 0.001, 0.0001,
1, 2,
0.5, 0, -1, -2],
'att2':[0.01, 0.0001, 0.00001,
1.1, 2.2,
2.37, 0, -1.5, -2.5]})
test.max().max()
Out: 2.37000
Desirable result:
att1 att2
5 0.5 2.37
UPD: I updated the test dataframe as it caused confusions (my fault!). I need to print one row which contains the max value for the whole dataframe.
Upvotes: 1
Views: 2008
Reputation: 25239
Edit:
After OP further explanation, I think comparing values
array to values.max()
is more flexible as follows:
test[test.values == test.values.max()]
It returns the row of max_value of dataframe. In case of att1_max is same as att2_max but on different rows, it returns both rows. In this case, if single row is preferable, add head(1)
to it.
att1_max and att2_max on same row:
Out[660]:
att1 att2
0 0.1000 0.01000
1 0.0010 0.00010
2 0.0001 0.00001
3 1.0000 1.10000
4 2.0000 2.20000
5 2.3000 2.37000
6 0.0000 0.00000
7 -1.0000 -1.50000
8 -2.0000 -2.50000
In [661]: test[test.values == test.values.max()]
Out[661]:
att1 att2
5 2.3 2.37
att1_max and att2_max on different rows:
Out[664]:
att1 att2
0 0.1000 0.01000
1 0.0010 0.00010
2 0.0001 0.00001
3 1.0000 1.10000
4 2.0000 2.20000
5 2.3000 1.37000
6 0.0000 0.00000
7 -1.0000 -1.50000
8 -2.0000 -2.50000
In [665]: test[test.values == test.values.max()]
Out[665]:
att1 att2
5 2.3 1.37
att1_max is same as att2_max but on different rows(this case stack
only returns 1 row while this returns both rows)
Out[668]:
att1 att2
0 0.1000 0.01000
1 25.0500 0.00010
2 0.0001 0.00001
3 1.0000 1.10000
4 2.0000 2.20000
5 2.3000 1.37000
6 0.0000 0.00000
7 -1.0000 25.05000
8 -2.0000 -2.50000
In [669]: test[test.values == test.values.max()]
Out[669]:
att1 att2
1 25.05 0.0001
7 -1.00 25.0500
Note: in the last case, if returning single is required, just add head(1)
In [670]: test[test.values == test.values.max()].head(1)
Out[670]:
att1 att2
1 25.05 0.0001
Note 2: if att1_max and att2_max is the same and on same row, that row will show twice. In that case using drop_duplicates()
to handle it.
Original:
@Wen-Ben answer is good, but I think using stack
here is unneccessary. I prefer idxmax
and drop_duplicates
:
test.iloc[test.idxmax()].drop_duplicates()
or
test.loc[test.idxmax().drop_duplicates()]
att1_max and att2_max on same row:
In [510]: test.iloc[test.idxmax()].drop_duplicates()
Out[510]:
att1 att2
5 2.3 2.37
att1_max and att2_max on different rows:
In [513]: test.iloc[test.idxmax()].drop_duplicates()
Out[513]:
att1 att2
5 2.3 1.37
4 2.0 2.20
So, att1_max and att2_max are on same row, return exact 1 row. att1_max and att2_max are on different rows, return 2 rows where att1_max and att2_max exists.
Upvotes: 0
Reputation: 153460
Let's use np.where which returns row and column index:
r, _ = np.where(test.values == np.max(test.values))
test.iloc[r]
Output:
att1 att2
5 2.3 2.37
Upvotes: 0
Reputation: 323226
I am using idxmax
here after stack
test.iloc[[test.stack().idxmax()[0]]]
Out[154]:
att1 att2
5 2.3 2.37
Upvotes: 5