Reputation: 227
I have this example dataset
products = ["A", "B", "C", "D"]
stores = ["store1", "store2", "store3"]
n = 30
product_list = [products[i] for i in np.random.randint(0, len(products), n)]
store_list = [stores[i] for i in np.random.randint(0, len(stores), n)]
rating_list = np.random.random(n) * 5
sales_list = np.random.random(n) * 10000
df = pd.DataFrame(
{'store': store_list,
'product': product_list,
'sales': sales_list,
'rating': rating_list})
and then sum the sales
df_1=df.groupby(['store','product']).agg({'sales':['sum']})
df_1
and ordered it by highest sales while maintain the store
df_2 = df_1.groupby(level=0, group_keys=False).apply(
lambda x: x.sort_values(('sales', 'sum'), ascending=False))
df_2
How can I facet by the store, so the resulting visualization is like the following?
Upvotes: 2
Views: 337
Reputation: 62403
pandas.DataFrame.plot
by shaping the data with pandas.DataFrame.pivot_table
.python 3.8.11
, matplotlib 3.4.2
, seaborn 0.11.2
, and pandas 1.3.1
.import pandas as pd
import matplotlib.pyplot as plt
# using the sample data; reshape df
dfp = df.pivot_table(index='product', columns='store', values='sales', aggfunc='sum')
# display(dfp)
store store1 store2 store3
product
A 9303.543781 15323.422183 20738.561588
B NaN 7549.028221 NaN
C 13976.321362 22350.050356 9865.392344
D 6905.455849 3183.767513 6010.941242
# plot
dfp.plot(kind='bar', subplots=True, layout=(1, 3), figsize=(8, 4), legend=False, rot=0,
sharey=True, title='Store Sales by Product', ylabel='Total Sales')
plt.show()
subplots=True
)
dfp.plot(kind='bar', rot=0, figsize=(5, 3), title='Store Sales by Product', ylabel='Total Sales')
plt.show()
index
and columns
tells a different storydfp = df.pivot_table(index='store', columns='product', values='sales', aggfunc='sum')
dfp.plot(kind='bar', rot=0, figsize=(5, 3), title='Product Sales by Store', ylabel='Total Sales')
plt.show()
seaborn.catplot
.catplot
this can be done without .groupby
or .pivot_table
because kind='bar'
has an estimator
parameter.col=
import seaborn as sns
sns.catplot(kind='bar', data=df, col='store', x='product', y='sales',
order=sorted(products), col_order=sorted(stores), estimator=sum, ci=False, height=3)
plt.show()
hue=
df
) for this plot is different than the other plots.sns.catplot(kind='bar', data=df, hue='store', x='product', y='sales', height=3,
col_order=sorted(stores), estimator=sum, ci=False, order=sorted(products))
plt.show()
Upvotes: 2
Reputation: 12496
You should reset the index in the last passage:
df_2 = df_1.groupby(level=0, group_keys=False).apply(
lambda x: x.sort_values(('sales', 'sum'), ascending=False)).reset_index()
Then you can plot with seaborn.FacetGrid
:
g = sns.FacetGrid(df_2, col = 'store')
g.map(sns.barplot, 'product', 'sales')
plt.show()
Upvotes: 3