Reputation: 109
I'm trying to think of a way to most effectively display the following analysis. I'm using Python and Plotly for other analysis and would like to stick with that.
Say I have a number of newspapers. Each newspaper has a different amount of circulation worldwide. Within that, some percentage of the circulation is from the US. And within that, some percentage is from a given state, say, California.
I'd like to have a bar graph that shows, for one journal:
So I want a compact way to show
Then repeat for each journal and look for trends.
Plotly has a stacked bar chart which looks close, but I want to customize to specifically call out the three percentages. Each newspaper has a different total number, so a stacked bar chart normalized to 100% won't tell me the magnitude of each different newspaper.
I was thinking total %'s on the left of the bar, and US-specific %'s on the right of the bar. Or different colors?
Any advice is appreciated.
---Edit to add MWE---
import pandas as pd
import plotly.express as px
data = {'Name':['Paper A', 'Paper B'],
'Total circ':[1000000, 800000],
'US circ':[500000, 200000],
'CA':[100000, 100000]
}
df = pd.DataFrame.from_dict(data)
df['not CA'] = df['US circ'] - df['CA']
df['not US'] = df['Total circ'] - df['US circ']
fig = px.bar(df, x='Name', y=['not US', 'not CA', 'CA'], text_auto=True)
Upvotes: 0
Views: 924
Reputation: 35205
I have created code to add the composition ratios as labels assuming the bar chart you have created. Convert the presented data frame from wide format to long format. It then calculates the composition ratios for each group. Next, with the data frame extracted to only the regions you need, use the graph object to create a stacked bar chart, extracted by region. Labels are created from a dedicated list of the number of copies and composition ratios. The labels are specified in the loop process.
df = df.melt(id_vars='Name', value_vars=df.columns[1:], var_name='Region', value_name='Volume')
df['percentage'] = df.groupby(['Name'])['Volume'].apply(lambda x: 100 * x / float(x.head(1))).values
df = df[(df['Region'] != 'Total circ') & (df['Region'] != 'US circ')]
new_labels = ['{}k({}%)'.format(int(v/1000),p) for v, p in zip(df.Volume,df.percentage)]
import plotly.graph_objects as go
fig = go.Figure()
for i,r in enumerate(df['Region'].unique()):
dff = df.query('Region == @r')
#print(dff)
fig.add_trace(go.Bar(x=dff['Name'], y=dff['Volume'], text=[new_labels[i+i*1],new_labels[1+i*2]], name=r))
fig.update_layout(barmode='stack', autosize=True, height=450)
fig.show()
Upvotes: 1