Reputation: 1745
I am producing a pandas barplot with raw counts represented by the plot, however I would like to annotate the bars with the pct of those counts as a whole. I have seen a lot of people using ax.patches
methods to annotate but my values are unrelated to the get_height
of the actual bars.
Here is some toy data. The plot produced will be the individual counts of the specific type. However, I want to add annotations above that specific bar that represent the pct total of that specific type to all types for that person's name.
Let me know if you need any more clarification.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
d = {'ID': [1,1,1,2,2,3,3,3,4],
'name': ['bob','bob','bob','shelby','shelby','jordan','jordan','jordan','jeff'],
'type': ['type1','type2','type4','type1','type6','type5','type8','type2',None]}
df: pd.DataFrame = pd.DataFrame(data=d)
df_pivot: pd.DataFrame = df.pivot_table(index='type', columns=['name'], values='ID', aggfunc={'ID': np.sum}).fillna(0)
# create percent totals of the specific type's row of the total
df_pivot['bob_pct_total']: pd.Series = (df_pivot['bob']/df_pivot['bob'].sum()).mul(100).round(1)
df_pivot['shelby_pct_total']: pd.Series = (df_pivot['shelby']/df_pivot['shelby'].sum()).mul(100).round(1)
df_pivot['jordan_pct_total']: pd.Series = (df_pivot['jordan']/df_pivot['jordan'].sum()).mul(100).round(1)
df_pivot.head(10)
name bob jordan shelby bob_pct_total shelby_pct_total jordan_pct_total
type
type1 1.0 0.0 2.0 33.3 50.0 0.0
type2 1.0 3.0 0.0 33.3 0.0 33.3
type4 1.0 0.0 0.0 33.3 0.0 0.0
type5 0.0 3.0 0.0 0.0 0.0 33.3
type6 0.0 0.0 2.0 0.0 50.0 0.0
type8 0.0 3.0 0.0 0.0 0.0 33.3
fig, ax = plt.subplots(figsize=(15,15))
df_pivot.plot(kind='bar', y=['bob','jordan','shelby'], ax=ax)
Upvotes: 0
Views: 234
Reputation: 80279
You can use the old approach, looping through the bars, using the height to position whatever text you want. Since matplotlib 3.4.0 there also is a new function bar_label
that removes much of the boilerplate:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
d = {'ID': [1, 1, 1, 2, 2, 3, 3, 3, 4],
'name': ['bob', 'bob', 'bob', 'shelby', 'shelby', 'jordan', 'jordan', 'jordan', 'jeff'],
'type': ['type1', 'type2', 'type4', 'type1', 'type6', 'type5', 'type8', 'type2', None]}
df: pd.DataFrame = pd.DataFrame(data=d)
df_pivot: pd.DataFrame = df.pivot_table(index='type', columns=['name'], values='ID', aggfunc={'ID': np.sum}).fillna(0)
# create percent totals of the specific type's row of the total
df_pivot['bob_pct_total']: pd.Series = (df_pivot['bob'] / df_pivot['bob'].sum()).mul(100).round(1)
df_pivot['shelby_pct_total']: pd.Series = (df_pivot['shelby'] / df_pivot['shelby'].sum()).mul(100).round(1)
df_pivot['jordan_pct_total']: pd.Series = (df_pivot['jordan'] / df_pivot['jordan'].sum()).mul(100).round(1)
fig, ax = plt.subplots(figsize=(12, 5))
columns = ['bob', 'jordan', 'shelby']
df_pivot.plot(kind='bar', y=['bob', 'jordan', 'shelby'], rot=0, ax=ax)
for bars, col in zip(ax.containers, ['bob_pct_total', 'jordan_pct_total', 'shelby_pct_total']):
ax.bar_label(bars, labels=['' if val == 0 else f'{val}' for val in df_pivot[col]])
plt.tight_layout()
plt.show()
PS: To skip labeling the first bars, you could experiment with:
for bars, col in zip(ax.containers, ['bob_pct_total', 'jordan_pct_total', 'shelby_pct_total']):
labels=['' if val == 0 else f'{val}' for val in df_pivot[col]]
labels[0] = ''
ax.bar_label(bars, labels=labels)
Upvotes: 1