Jaclyn Fink
Jaclyn Fink

Reputation: 21

(Pandas) Group by one column and keep only the rows where another column is the max

I have a pandas dataframe with 6 columns, two of them being "date" and "time". For each date, I want to keep only the rows with the maximum time value. For example, below the date is on the left and the time is on the right. I want to only keep all of the rows where the time is 1925.

20200109    1925
20200109    1925
20200109    1925
20200109    1925
20200109    1925
20200109    1925
20200109    1830
20200109    1830
20200109    1830
20200109    1830
20200109    1830

I have tried so many solutions sorting and using groupby, such as dataframe.groupby('date').apply(lambda x: x.loc[x.time == x.time.max(),['date','time']])

but this only returns the date and time columns. I want all 6 columns in my result

Edit: I want to keep all of the dates associated with the maximum time.

Upvotes: 2

Views: 2282

Answers (1)

Sajan
Sajan

Reputation: 1267

Try something like this -

dates = [20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200110]
times = [1925, 1925, 1925, 1925, 1925, 1925, 1830, 1830, 1830, 1830, 1830, 1930]
df = pd.DataFrame({'dates':dates, 'times':times})

filt = df.groupby(['dates'])['times'].max().to_frame().reset_index()
final = pd.merge(df,filt,on=['dates','times'])

final 
      dates  times
  0  20200109   1925
  1  20200109   1925
  2  20200109   1925
  3  20200109   1925
  4  20200109   1925
  5  20200109   1925
  6  20200110   1930

I think even if you add more columns to df, final would have the extra columns as you desire.

Upvotes: 1

Related Questions