Reputation: 55
Based on the code below, I'm trying to assign some columns to my DataFrame which has been grouped by month of the date and works well :
all_together = (df_clean.groupby(df_clean['ContractDate'].dt.strftime('%B'))
.agg({'Amount': [np.sum, np.mean, np.min, np.max]})
.rename(columns={'sum': 'sum_amount', 'mean': 'avg_amount', 'amin': 'min_amount', 'amax': 'max_amount'}))
But for some reason when I try to plot the result(in any kind as plot), it's not able to recognize my "ContractDate" as a column and also any of those renamed names such as: 'sum_amount'.
Do you have any idea that what's the issue and what am I missing as a rule for plotting the data?
I have tried the code below for plotting and it asks me what is "ContractDate" and what is "sum_amount"!
all_together.groupby(df_clean['ContractDate'].dt.strftime('%B'))['sum_amount'].nunique().plot(kind='bar')
#or
all_together.plot(kind='bar',x='ContractDate',y='sum_amount')
I really appreciate your time
Cheers, z.A
Upvotes: 0
Views: 730
Reputation: 989
When you apply groupby function on a DataFrame, it makes the groupby column as index(ContractDate in your case). So you need to reset the index first to make it as a column.
df = pd.DataFrame({'month':['jan','feb','jan','feb'],'v2':[23,56,12,59]})
t = df.groupby('month').agg('sum')
Output:
v2
month
feb 115
jan 35
So as you see, you're getting months as index. Then when you reset the index:
t.reset_index()
Output:
month v2
0 feb 115
1 jan 35
Next when you apply multiple agg functions on a single column in the groupby, it will create a multiindexed dataframe. So you need to make it as single level index:
t = df.groupby('month').agg({'v2': [np.sum, np.mean, np.min, np.max]}).rename(columns={'sum': 'sum_amount', 'mean': 'avg_amount', 'amin': 'min_amount', 'amax': 'max_amount'})
v2
sum_amount avg_amount min_amount max_amount
month
feb 115 57.5 56 59
jan 35 17.5 12 23
It created a multiindex.if you check t.columns, you get
MultiIndex(levels=[['v2'], ['avg_amount', 'max_amount', 'min_amount', 'sum_amount']],
labels=[[0, 0, 0, 0], [3, 0, 2, 1]])
Now use this:
t.columns = t.columns.get_level_values(1)
t.reset_index(inplace=True)
You will get a clean dataframe:
month sum_amount avg_amount min_amount max_amount
0 feb 115 57.5 56 59
1 jan 35 17.5 12 23
Hope this helps for your plotting.
Upvotes: 1