user3116949
user3116949

Reputation: 275

How can I grab rows with max date from Pandas dataframe?

I have a Pandas dataframe that looks like this:

enter image description here

and I want to grab for each distinct ID, the row with the max date so that my final results looks something like this:

enter image description here

My date column is of data type 'object'. I have tried grouping and then trying to grab the max like the following:

idx = df.groupby(['ID','Item'])['date'].transform(max) == df_Trans['date']
df_new = df[idx]

However I am unable to get the desired result.

Upvotes: 14

Views: 22384

Answers (3)

vegabondx
vegabondx

Reputation: 71

My answer is a generalization of piRSquared's answer:

  • manykey indicates the keys from which the mapping is desired (many-to)

  • onekey indicates the keys to which the mapping is desired (-to-one)

  • sortkey is sortable key and it follows asc set to True (as python standard)

    def get_last(df:pd.DataFrame,manykey:list[str],onekey:list[str],sortkey,asc=True):
         return df.sort_values(sortkey,asc).drop_duplicates(subset=manykey, keep='last')[manykey+onekey]
    

In your case the answer should be:

get_last(df,["id"],["item"],"date")

Note that I am using the onekey explicitly because I want to drop the rest of the keys (if they are in the table) and create a mapping.

Upvotes: 0

Facundo Scasso
Facundo Scasso

Reputation: 21

The last bit of code from piRSquared's answer is wrong.

We are trying to get distinct IDs, so the column used in drop_duplicates should be 'ID'. keep='last' would then retrieve the last (and max) date for each ID.

df.sort_values(['ID', 'date']).drop_duplicates('ID', keep='last')

Upvotes: 2

piRSquared
piRSquared

Reputation: 294218

idxmax

Should work so long as index is unique or the maximal index isn't repeated.

df.loc[df.groupby('ID').date.idxmax()]

OP (edited)

Should work as long as maximal values are unique. Otherwise, you'll get all rows equal to the maximum.

df[df.groupby('ID')['date'].transform('max') == df['date']]

W-B go to solution

And also very good solution.

df.sort_values(['ID', 'date']).drop_duplicates('date', keep='last')

Upvotes: 21

Related Questions