Find the earliest record in each group by month

Question

I have a pandas dataframe like this:

Month  Name  Revenue   EARLY_MIN
Jan     A     100        ?
Jan     A      50        ?
Feb     A      30        ?

For each 'Name', I want to pick the earliest record (Jan). If it returns more than one row, I will pick the record with min. Revenue. So 50 in this case. And I will create a column EARLY_MIN=1 for this record. So in this example, the second row with have EARLY_MIN=1, other rows will have EARLY_MIN=0.

How can I do this in pandas with min. number of steps?

cs95 · Accepted Answer

Using python's calendar stdlib, you can convert "Months" to a numeric value. This makes it easy for us to order rows based on the "Month" value.

import calendar

mapping = {calendar.month_abbr[k]: k for k in range(1, 13)}
u = df.assign(Month=df.Month.map(mapping))
u

   Month Name  Revenue EARLY_MIN
0      1    A      100         ?
1      1    A       50         ?
2      2    A       30         ?

Now, you can use groupby and idxmin, or something similar.

idx = (u['Revenue'].mask(u.groupby('Name').Month.transform('min') != u['Month'])
                   .groupby(u.Name)
                   .idxmin()
                   .values)
df.loc[idx, 'EARLY_MIN'] = 1
df

  Month Name  Revenue EARLY_MIN
0   Jan    A      100         ?
1   Jan    A       50         1
2   Feb    A       30         ?

Find the earliest record in each group by month

Answers (2)

Related Questions