Reputation: 17097
I have a pandas dataframe like this:
Month Name Revenue EARLY_MIN
Jan A 100 ?
Jan A 50 ?
Feb A 30 ?
For each 'Name', I want to pick the earliest record (Jan). If it returns more than one row, I will pick the record with min. Revenue. So 50 in this case. And I will create a column EARLY_MIN=1 for this record. So in this example, the second row with have EARLY_MIN=1, other rows will have EARLY_MIN=0.
How can I do this in pandas with min. number of steps?
Upvotes: 2
Views: 84
Reputation: 323306
By using the same setting up with Vaishali
#df['Month'] = pd.to_datetime(df.Month, format='%b').dt.month
df['EARLY_MIN']=(~df.sort_values(['Month','Revenue']).duplicated('Name',keep='first')).astype(int)
df
Out[1006]:
Month Name Revenue EARLY_MIN
0 1 A 100 0
1 1 A 50 1
2 2 A 30 0
Upvotes: 4
Reputation: 402593
Using python's calendar
stdlib, you can convert "Months" to a numeric value. This makes it easy for us to order rows based on the "Month" value.
import calendar
mapping = {calendar.month_abbr[k]: k for k in range(1, 13)}
u = df.assign(Month=df.Month.map(mapping))
u
Month Name Revenue EARLY_MIN
0 1 A 100 ?
1 1 A 50 ?
2 2 A 30 ?
Now, you can use groupby
and idxmin
, or something similar.
idx = (u['Revenue'].mask(u.groupby('Name').Month.transform('min') != u['Month'])
.groupby(u.Name)
.idxmin()
.values)
df.loc[idx, 'EARLY_MIN'] = 1
df
Month Name Revenue EARLY_MIN
0 Jan A 100 ?
1 Jan A 50 1
2 Feb A 30 ?
Upvotes: 4