Natasha
Natasha

Reputation: 1521

Stacked bar plot from Dataframe using groupby

I have the following dataframe and I am trying to create a stacked bar plot

import os
from pprint import pprint
import matplotlib.pyplot as plt
import pandas as pd


def classify_data():
    race = ['race1','race1','race1','race1','race2','race2','race2', 'race2']
    qualifier = ['last','first','first','first','last','last','first','first']
    participant = ['rat','rat','cat','cat','rat','dog','dog','dog']
    df = pd.DataFrame(
        {'race':race,
         'qualifier':qualifier,
         'participant':participant

        }
    )
    pprint(df)
    df2 = df.groupby(['race','qualifier'])['race'].count().unstack('qualifier').fillna(0)
    df2[['first','last']].plot(kind='bar', stacked=True)
    plt.show()



classify_data()

I could manage to obtain the following plot. But , I want to create two plots out of my dataframe

One plot containing the following data for the qualifier 'last'

Race1 rat 1
Race1 cat 0
Race1 dog 0 
Race2 rat 1
Race2 dog 1
Race2 cat 0

So the first bar plot would have 2 bars and each bar coded with a different color for the count of participant

Likewise a second plot for qualifier 'first'

EDIT:

  Race1 rat 1
  Race1 cat 2
  Race1 dog 0 
  Race2 rat 0
  Race2 dog 2
  Race2 cat 0

From the original dataframe , I have to create the above two dataframe for creating the stacked plots plot

I am not sure how to use the groupby function and get the count of 'participant' for each 'qualifier' for a given 'race'

EDIT 2 : For qualifier 'last' the desired plot would look like( blue for rat , red for dog).

enter image description here

For qualifier 'first'

enter image description here

Could someone suggest me on how to proceed from here?

Upvotes: 3

Views: 5564

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150815

IIUC, this is what you want:

df2 = (df.groupby(['race','qualifier','participant'])
         .size()
         .unstack(level=-1)
         .reset_index()
      )

fig,axes = plt.subplots(1,2,figsize=(12,6),sharey=True)
for ax,q in zip(axes.ravel(),['first','last']):
    tmp_df = df2[df2.qualifier.eq(q)]
    tmp_df.plot.bar(x='race', ax=ax, stacked=True)

Output:

enter image description here

Upvotes: 2

Related Questions