Napier
Napier

Reputation: 227

Pandas groupby sort within groups retaining multiple aggregates and visualize it with facet

I have this example dataset

products = ["A", "B", "C", "D"]
stores = ["store1", "store2", "store3"]
n = 30

product_list = [products[i] for i in np.random.randint(0, len(products), n)]
store_list = [stores[i] for i in np.random.randint(0, len(stores), n)]
rating_list = np.random.random(n) * 5
sales_list = np.random.random(n) * 10000

df = pd.DataFrame(
    {'store': store_list, 
     'product': product_list, 
     'sales': sales_list, 
     'rating': rating_list})

and then sum the sales

df_1=df.groupby(['store','product']).agg({'sales':['sum']})
df_1

enter image description here

and ordered it by highest sales while maintain the store

df_2 = df_1.groupby(level=0, group_keys=False).apply(
                   lambda x: x.sort_values(('sales', 'sum'), ascending=False))
df_2

enter image description here

How can I facet by the store, so the resulting visualization is like the following?

enter image description here

Upvotes: 2

Views: 337

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62403

  • Visualizations are about telling a story, and presenting data in a clear and concise manner to convey that story. As such, it is more clear to have the same xaxis order for each plot.
    • Someone looking at your visualization should be able to quickly discern which product from which store has the highest total sales, but that is not easy if the product categories for every axis are not in the same order
  • This can be done with pandas.DataFrame.plot by shaping the data with pandas.DataFrame.pivot_table.
  • Tested with python 3.8.11, matplotlib 3.4.2, seaborn 0.11.2, and pandas 1.3.1.
import pandas as pd
import matplotlib.pyplot as plt

# using the sample data; reshape df
dfp = df.pivot_table(index='product', columns='store', values='sales', aggfunc='sum')

# display(dfp)
store          store1        store2        store3
product                                          
A         9303.543781  15323.422183  20738.561588
B                 NaN   7549.028221           NaN
C        13976.321362  22350.050356   9865.392344
D         6905.455849   3183.767513   6010.941242

# plot
dfp.plot(kind='bar', subplots=True, layout=(1, 3), figsize=(8, 4), legend=False, rot=0,
         sharey=True, title='Store Sales by Product', ylabel='Total Sales')
plt.show()

enter image description here

  • This presentation is more clear without subplots (remove subplots=True)
    • It's more clear that product B only had sales at store 2
dfp.plot(kind='bar', rot=0, figsize=(5, 3), title='Store Sales by Product', ylabel='Total Sales')
plt.show()

enter image description here

  • Switching the categories for index and columns tells a different story
dfp = df.pivot_table(index='store', columns='product', values='sales', aggfunc='sum')

dfp.plot(kind='bar', rot=0, figsize=(5, 3), title='Product Sales by Store', ylabel='Total Sales')
plt.show()

enter image description here

seaborn.catplot

  • With .catplot this can be done without .groupby or .pivot_table because kind='bar' has an estimator parameter.
  • Using col=
import seaborn as sns

sns.catplot(kind='bar', data=df, col='store', x='product', y='sales',
            order=sorted(products), col_order=sorted(stores), estimator=sum, ci=False, height=3)
plt.show()

enter image description here

  • Using hue=
  • FYI, the random data (df) for this plot is different than the other plots.
sns.catplot(kind='bar', data=df, hue='store', x='product', y='sales', height=3,
            col_order=sorted(stores), estimator=sum, ci=False, order=sorted(products))
plt.show()

enter image description here

Upvotes: 2

Zephyr
Zephyr

Reputation: 12496

You should reset the index in the last passage:

df_2 = df_1.groupby(level=0, group_keys=False).apply(
                   lambda x: x.sort_values(('sales', 'sum'), ascending=False)).reset_index()

Then you can plot with seaborn.FacetGrid:

g = sns.FacetGrid(df_2, col = 'store')
g.map(sns.barplot, 'product', 'sales')

plt.show()

enter image description here

Upvotes: 3

Related Questions