Reputation: 21
I have a pandas dataframe with 6 columns, two of them being "date" and "time". For each date, I want to keep only the rows with the maximum time value. For example, below the date is on the left and the time is on the right. I want to only keep all of the rows where the time is 1925.
20200109 1925
20200109 1925
20200109 1925
20200109 1925
20200109 1925
20200109 1925
20200109 1830
20200109 1830
20200109 1830
20200109 1830
20200109 1830
I have tried so many solutions sorting and using groupby, such as dataframe.groupby('date').apply(lambda x: x.loc[x.time == x.time.max(),['date','time']])
but this only returns the date and time columns. I want all 6 columns in my result
Edit: I want to keep all of the dates associated with the maximum time.
Upvotes: 2
Views: 2282
Reputation: 1267
Try something like this -
dates = [20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200109, 20200110]
times = [1925, 1925, 1925, 1925, 1925, 1925, 1830, 1830, 1830, 1830, 1830, 1930]
df = pd.DataFrame({'dates':dates, 'times':times})
filt = df.groupby(['dates'])['times'].max().to_frame().reset_index()
final = pd.merge(df,filt,on=['dates','times'])
final
dates times
0 20200109 1925
1 20200109 1925
2 20200109 1925
3 20200109 1925
4 20200109 1925
5 20200109 1925
6 20200110 1930
I think even if you add more columns to df
, final
would have the extra columns as you desire.
Upvotes: 1