kim
kim

Reputation: 567

How to make annotated grouped stacked barchart in matplotlib?

I have covid19 tracking time series data which I scraped from covid19 tracking site. I want to make an annotated grouped stacked barchart. To do so, I used matplotlib and seaborn for making plot, I figured out plotting data to render the corresponding barchart. I tried plot annotation in SO but didn't get the correct annotated plot. Also, I have some issues of getting grouped stacked barchart for the time series data. Can anyone suggest a possible way of doing this? Any idea?

my attempt

here is the reproducible time series data that I scraped from covid19 tracking site:

import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns

bigdf = pd.read_csv("coviddf.csv")
bigdf['run_date'] = pd.to_datetime(bigdf['run_date'])

for g, d in bigdf.groupby(['company']):
    data = d.groupby(['run_date','county-state', 'company', 'est'], as_index=True).agg({'new': sum, 'confirmed': sum, 'death': sum}).stack().reset_index().rename(columns={'level_4': 'type', 0: 'val'})
    print(f'{g}')
    g = sns.FacetGrid(data, col='est', sharex=False, sharey=False, height=5, col_wrap=4)
    g.map(sns.barplot, 'run_date', 'val', 'type', order=data.run_date.dt.date.unique(), hue_order=data['type'].unique())
    g.add_legend()
    g.set_xticklabels(rotation=90)
    g.set(yscale='log')
    plt.tight_layout()
    plt.show()

I have a couple of issues from the above attempt. I need to make grouped stacked barchart where each group is each different company, and each stack barchart is individual establishment (a.k.a, est column in coviddf.csv), so each company might have multiple establishments, so I want to see the number of new, confirmed and death covid19 cases in grouped stacked barchart. Is there any way to make annotated grouped stacked barchart for this time series? Can anyone suggest a possible way of achieving this? How to make these plots in one page? Any idea?

desired output

I tried to make grouped stacked barchart like this post and second related post did. Here is the desired annotated grouped stacked barchart that I want to make:

enter image description here

Can anyone point me out how to make this happen from above current attempt? Any thoughts about this?

Upvotes: 0

Views: 656

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62413

Grouped Bar Plot

  • This is not exactly what you've asked for, but I think it's a better option.
    • It's certainly an easier option.
    • The issue with the stacked bars is that confirmed is so large compared to the other values, that you will not be able to see new and death
  • I think the best option for this data is a horizontal bar plot with a group for each company & est.
import pandas as pd

# load the data
df = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")
df.drop(columns=['Unnamed: 0'], inplace=True)  # drop this extra column

# select columns and shape the dataframe
dfs = df.iloc[:, [2, 3, 4, 12, 13]].set_index(['company', 'est']).sort_index(level=0)

# display(dfs)
                      confirmed  new  death
company        est                         
Agri  Co.      235        10853    0    237
CS  Packers    630        10930   77    118
Caviness       675          790    5     19
Central Valley 6063A       6021   44     72
FPL            332         5853   80    117

# plot
ax = dfs.plot.barh(figsize=(8, 25), width=0.8)
plt.xscale('log')
plt.grid(True)
plt.tick_params(labelbottom=True, labeltop=True)
plt.xlim(10**0, 1000000)

# annotate the bars
for rect in ax.patches:
    # Find where everything is located
    height = rect.get_height()
    width = rect.get_width()
    x = rect.get_x()
    y = rect.get_y()

    # The width of the bar is the count value and can used as the label
    label_text = f'{width:.0f}'

    label_x = x + width
    label_y = y + height / 2

    # don't include label if it's equivalently 0
    if width > 0.001:
        ax.annotate(label_text, xy=(label_x, label_y), va='center', xytext=(2, -1), textcoords='offset points')

enter image description here

Stacked Bar Plot

  • new and death are barely visible compared to confirmed.
dfs.plot.barh(stacked=True, figsize=(8, 15))
plt.xscale('log')

enter image description here

Upvotes: 3

Jacob K
Jacob K

Reputation: 784

I had trouble finding info on how to create a GROUPED and STACKED bar chart in matplotlib and later Plotly.

Here is my attempt at solving your problem (using Plotly):

# Import packages
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load data (I used the raw GitHub link so that no local file download was required)
bigdf = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")

# Get all companies names and number of companies
allComp = np.unique(bigdf.company)
numComp = allCompanies.shape[0]

# For all the companies
for i in range(numComp):
    # Grab company data and the names of the establishments for that company
    comp = allComp[i]
    compData = bigdf.loc[bigdf.company == comp]
    estabs = compData.est.to_numpy().astype(str)
    numEst = compData.shape[0]

    # Grab the new, confirmed, and death values for each of the establishments in that company
    newVals = []
    confirmedVals = []
    deathVals = []
    for i in range(numEst):
        estabData = compData.loc[compData.est == estabs[i]]
        newVals.append(estabData.new.to_numpy()[0])
        confirmedVals.append(estabData.confirmed.to_numpy()[0])
        deathVals.append(estabData.death.to_numpy()[0])

    # Load that data into a Plotly graph object
    fig = go.Figure(
        data=[
            go.Bar(name='New', x=estabs, y=newVals, yaxis='y', offsetgroup=1),
            go.Bar(name='Confirmed', x=estabs, y=confirmedVals, yaxis='y', offsetgroup=2),
            go.Bar(name='Death', x=estabs, y=deathVals, yaxis='y', offsetgroup=3)
        ]
    )

    # Update the layout (add time, set x/y axis titles, and bar graph mode)
    fig.update_layout(title='COVID Data for ' + comp, xaxis=dict(type='category'), xaxis_title='Establishment', 
                      yaxis_title='Value', barmode='stack')
    fig.show()

where the output is 16 separate Plotly graphs for each company (which are interactable, and you can turn on various traces, as scaling for new/confirmed/death values wasn't so easy). Each plot has all the establishments for that company in the x-axis, and the new/confirmed/death values for each establishment as a stacked bar chart.

Here is an example plot: HBS Company COVID Data

I know this doesn't completely answer your question, but I hope you appreciate my effort :)

Upvotes: 2

Related Questions