Kelli-Jean
Kelli-Jean

Reputation: 1447

Plotting percentage of totals with pandas group bys

I am trying to plot a bar chart of a pandas data frame that is the result of two group bys.

In particular, my data frame looks exactly like the output from another SO post's answer (https://stackoverflow.com/a/23377155/7243972):

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
               'office_id': list(range(1, 7)) * 2,
               'sales': [np.random.randint(100000, 999999) for _ in range(12)]})

state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
state = df.groupby(['state']).agg({'sales': 'sum'})
results = state_office.div(state, level='state') * 100

I would like to plot results so that each state is a different color and the office_id is on the x-axis. This is so that each office_id is grouped together and they can be easily compared.

I've tried adjusting the plot from results['sales'].plot.bar(), but I am struggling.

Upvotes: 0

Views: 2853

Answers (2)

ababuji
ababuji

Reputation: 1731

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
               'office_id': list(range(1, 7)) * 2,
               'sales': [np.random.randint(100000, 999999) for _ in 
range(12)]})

state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
state = df.groupby(['state']).agg({'sales': 'sum'})

results = state_office.div(state, level='state') * 100
results = results.reset_index()

fig, ax = plt.subplots()
for c, df in results.groupby('state'):
    ax.scatter(df['office_id'], df['sales'], label=c)
ax.legend()
ax.set_title('Scatterplot')
ax.set_xlabel('office_id')
ax.set_ylabel('sales')

This prints a scatterplot. See if you can take it from here!

Upvotes: 0

offwhitelotus
offwhitelotus

Reputation: 1079

First you need to flatten the dataframe:

data = []
for row in results.iterrows():
    state, office_id = row[0]
    sales = row[1][0]
    data.append((state, office_id, sales))
flat_df = pd.DataFrame(data, columns=['state', 'office_id', 'sales'])

then plot

import seaborn as sns
sns.set(style="whitegrid")

g = sns.factorplot(x="office_id", y="sales", hue="state", data=flat_df, kind="bar", palette="muted")

edit: just realized there is a simpler way to flatten the dataframe:

flat_df = results.reset_index(inplace=False)

Upvotes: 1

Related Questions