Z.A
Z.A

Reputation: 55

Plot a DataFrame based on grouped by column in Python

Based on the code below, I'm trying to assign some columns to my DataFrame which has been grouped by month of the date and works well :

all_together = (df_clean.groupby(df_clean['ContractDate'].dt.strftime('%B'))
                  .agg({'Amount': [np.sum, np.mean, np.min, np.max]})
                  .rename(columns={'sum': 'sum_amount', 'mean': 'avg_amount', 'amin': 'min_amount', 'amax': 'max_amount'}))

But for some reason when I try to plot the result(in any kind as plot), it's not able to recognize my "ContractDate" as a column and also any of those renamed names such as: 'sum_amount'.

Do you have any idea that what's the issue and what am I missing as a rule for plotting the data?

I have tried the code below for plotting and it asks me what is "ContractDate" and what is "sum_amount"!

all_together.groupby(df_clean['ContractDate'].dt.strftime('%B'))['sum_amount'].nunique().plot(kind='bar')
#or
all_together.plot(kind='bar',x='ContractDate',y='sum_amount')

I really appreciate your time

Cheers, z.A

Upvotes: 0

Views: 730

Answers (1)

Asetti sri harsha
Asetti sri harsha

Reputation: 989

When you apply groupby function on a DataFrame, it makes the groupby column as index(ContractDate in your case). So you need to reset the index first to make it as a column.

df = pd.DataFrame({'month':['jan','feb','jan','feb'],'v2':[23,56,12,59]})
t = df.groupby('month').agg('sum')

Output:

       v2
month   
feb    115
jan    35

So as you see, you're getting months as index. Then when you reset the index:

t.reset_index()

Output:

    month   v2
0   feb     115
1   jan     35

Next when you apply multiple agg functions on a single column in the groupby, it will create a multiindexed dataframe. So you need to make it as single level index:

t = df.groupby('month').agg({'v2': [np.sum, np.mean, np.min, np.max]}).rename(columns={'sum': 'sum_amount', 'mean': 'avg_amount', 'amin': 'min_amount', 'amax': 'max_amount'})

    v2
sum_amount  avg_amount  min_amount  max_amount
month               
feb 115 57.5    56  59
jan 35  17.5    12  23

It created a multiindex.if you check t.columns, you get

MultiIndex(levels=[['v2'], ['avg_amount', 'max_amount', 'min_amount', 'sum_amount']],
           labels=[[0, 0, 0, 0], [3, 0, 2, 1]])

Now use this:

t.columns = t.columns.get_level_values(1)
t.reset_index(inplace=True)

You will get a clean dataframe:

    month   sum_amount  avg_amount  min_amount  max_amount
0   feb       115          57.5       56          59
1   jan       35           17.5       12          23

Hope this helps for your plotting.

Upvotes: 1

Related Questions