Shamoon
Shamoon

Reputation: 43491

How can I get the row with a min for a certain column in a Pandas DataFrame?

My DataFrame is:

                                               model  epochs       loss
0  <keras.engine.sequential.Sequential object at ...       1  0.0286867
1  <keras.engine.sequential.Sequential object at ...       1  0.0210836
2  <keras.engine.sequential.Sequential object at ...       1  0.0250625
3  <keras.engine.sequential.Sequential object at ...       1   0.109146
4  <keras.engine.sequential.Sequential object at ...       1   0.253897

I want to get the row with the lowest loss.

I'm trying self.models['loss'].idxmin(), but that gives an error: TypeError: reduction operation 'argmin' not allowed for this dtype

Upvotes: 0

Views: 234

Answers (3)

Ananay Mital
Ananay Mital

Reputation: 1475

There are a number of ways to do exactly that:

Consider this example dataframe

df

level    beta
0   0   0.338
1   1   0.294
2   2   0.308
3   3   0.257
4   4   0.295
5   5   0.289
6   6   0.269
7   7   0.259
8   8   0.288
9   9   0.302

1) Using pandas conditionals

df[df.beta == df.beta.min()]  #returns pandas DataFrame object

level   beta
3   3   0.257

2) Using sort_values and choosing the first(0th) index

df.sort_values(by="beta").iloc[0]    #returns pandas Series object

level        3
beta     0.257
Name: 3, dtype: object

These are most readable methods I guess

Edit :

Made this graph to visualize time taken by the above two methods over increasing no. of rows in the dataframe. Although it largely depends on the dataframe in question, sort_values is considerably faster than conditionals when the number of rows is greater than 1000 or so.

Time taken by sort_values and conditional vs no of rows in the dataframe

Upvotes: 1

Hope this works

import pandas as pd
df = pd.DataFrame({'epochs':[1,1,1,1,1],'loss':[0.0286867,0.0286867,0.0210836,0.0109146,0.0109146]})
out = df.loc[df['loss'].idxmin()]

Upvotes: 1

Ben Pap
Ben Pap

Reputation: 2579

self.models[self.models['loss'] == self.models['loss'].min()]

Will give you the row the lowest loss (as long as self.models is your df). add .index to get the index number.

Upvotes: 1

Related Questions