Reputation: 3811
df
order_date Month Name Year Days Data
2015-12-20 Dec 2014 1 3
2016-1-21 Jan 2014 2 3
2015-08-20 Aug 2015 1 1
2016-04-12 Apr 2016 4 1
and so on
Code:
df = df.groupby(["Year", "Month Name"], as_index=False)["days"].agg(['min',
'mean'])
df3 = (df.groupby(["Year", "Month Name"], as_index=False)
["Data"].agg(['count']))
merged_df=pd.merge(df3, df, on=['Year','Month Name'])
I have a groupby output as below
Min Mean Count
Year Month Name
2015 Aug 2 11 200
Dec 5 13 130
Feb 3 15 100
Jan 4 20 123
May 1 21 342
Nov 2 12 234
2016 Apr 1 10 200
Dec 2 12 120
Feb 2 13 200
Jan 2 24 200
Sep 1 25 220
Issue:
Basically I am getting output of groupby sorted by Month Name starting from A to Z, So I am getting April, August, December, Feb etc......rather than Jan, Feb ....till Dec etc. How to get output sorted by Month number.
I need output like 2016, Jan, Feb ....Dec then 2017, Jan , Feb, Mar till Dec
Please help if there is merging of 2 dfs. I have just presented a simplified code here(real code is different, I need to merge both and then only I can work)
Upvotes: 2
Views: 2292
Reputation: 863281
EDIT: Your solution should be changed:
df1 = df.groupby(["Year", "Month Name"], as_index=False)["Days"].agg(['min', 'mean'])
df3 = df.groupby(["Year", "Month Name"], as_index=False)["Data"].agg(['count'])
merged_df=pd.merge(df3, df1, on=['Year','Month Name']).reset_index()
cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
merged_df['Month Name'] = pd.Categorical(merged_df['Month Name'],categories=cats, ordered=True)
merged_df = merged_df.sort_values(["Year", "Month Name"])
print (merged_df)
Year Month Name count min mean
1 2014 Jan 1 2 2
0 2014 Dec 1 1 1
2 2015 Aug 1 1 1
3 2016 Apr 1 4 4
Or:
df1 = (df.groupby(["Year", "Month Name"])
.agg(min_days=("Days", 'min'),
avg_days=("Days", 'mean'),
count = ('Data', 'count'))
.reset_index())
cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df1['Month Name'] = pd.Categorical(df1['Month Name'], categories=cats, ordered=True)
df1 = df1.sort_values(["Year", "Month Name"])
print (df1)
Year Month Name min_days avg_days count
1 2014 Jan 2 2 1
0 2014 Dec 1 1 1
2 2015 Aug 1 1 1
3 2016 Apr 4 4 1
Last solution with MultiIndex
and no categoricals, solution create helper dates column and sorting by it:
df1 = (df.groupby(["Year", "Month Name"])
.agg(min_days=("Days", 'min'),
avg_days=("Days", 'mean'),
count = ('Data', 'count'))
)
df1['dates'] = pd.to_datetime([f'{y}{m}' for y, m in df1.index], format='%Y%b')
df1 = df1.sort_values('dates')
print (df1)
min_days avg_days count dates
Year Month Name
2014 Jan 2 2 1 2014-01-01
Dec 1 1 1 2014-12-01
2015 Aug 1 1 1 2015-08-01
2016 Apr 4 4 1 2016-04-01
Upvotes: 4
Reputation: 75110
Here is my solution to sort by month number and return sorted month names for level=1
of multiindex taking merged_df
as the input:
import calendar
d={i:e for e,i in enumerate([*calendar.month_abbr])}
#for full month name use :-> d={i:e for e,i in enumerate([*calendar.month_name])}
merged_df.index=pd.MultiIndex.from_tuples(sorted(merged_df.index,key=lambda x: d.get(x[1])))
merged_df = merged_df.sort_index(level=0)
print(merged_df)
count min mean
Year Month Name
2014 Jan 1 2 2
Dec 1 1 1
2015 Aug 1 1 1
2016 Apr 1 4 4
Upvotes: 1
Reputation: 10531
Simply tell groupby you don't want it to sort group keys (by default, that's what it does - see the docs)
df.groupby(["Year", "Month Name"], as_index=False, sort=False)["Days"].agg(
["min", "mean"]
)
NOTE: you should make sure your df
is sorted before applying groupby
Upvotes: 3