Reputation: 567
I have covid19 tracking time series data which I scraped from covid19 tracking site. I want to make an annotated grouped stacked barchart. To do so, I used matplotlib
and seaborn
for making plot, I figured out plotting data to render the corresponding barchart. I tried plot annotation in SO
but didn't get the correct annotated plot. Also, I have some issues of getting grouped stacked barchart for the time series data. Can anyone suggest a possible way of doing this? Any idea?
my attempt
here is the reproducible time series data that I scraped from covid19 tracking site:
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns
bigdf = pd.read_csv("coviddf.csv")
bigdf['run_date'] = pd.to_datetime(bigdf['run_date'])
for g, d in bigdf.groupby(['company']):
data = d.groupby(['run_date','county-state', 'company', 'est'], as_index=True).agg({'new': sum, 'confirmed': sum, 'death': sum}).stack().reset_index().rename(columns={'level_4': 'type', 0: 'val'})
print(f'{g}')
g = sns.FacetGrid(data, col='est', sharex=False, sharey=False, height=5, col_wrap=4)
g.map(sns.barplot, 'run_date', 'val', 'type', order=data.run_date.dt.date.unique(), hue_order=data['type'].unique())
g.add_legend()
g.set_xticklabels(rotation=90)
g.set(yscale='log')
plt.tight_layout()
plt.show()
I have a couple of issues from the above attempt. I need to make grouped stacked barchart where each group is each different company, and each stack barchart is individual establishment (a.k.a, est
column in coviddf.csv
), so each company might have multiple establishments, so I want to see the number of new, confirmed and death covid19 cases in grouped stacked barchart. Is there any way to make annotated grouped stacked barchart for this time series? Can anyone suggest a possible way of achieving this? How to make these plots in one page? Any idea?
desired output
I tried to make grouped stacked barchart like this post and second related post did. Here is the desired annotated grouped stacked barchart that I want to make:
Can anyone point me out how to make this happen from above current attempt? Any thoughts about this?
Upvotes: 0
Views: 656
Reputation: 62413
confirmed
is so large compared to the other values, that you will not be able to see new
and death
company
& est
.import pandas as pd
# load the data
df = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")
df.drop(columns=['Unnamed: 0'], inplace=True) # drop this extra column
# select columns and shape the dataframe
dfs = df.iloc[:, [2, 3, 4, 12, 13]].set_index(['company', 'est']).sort_index(level=0)
# display(dfs)
confirmed new death
company est
Agri Co. 235 10853 0 237
CS Packers 630 10930 77 118
Caviness 675 790 5 19
Central Valley 6063A 6021 44 72
FPL 332 5853 80 117
# plot
ax = dfs.plot.barh(figsize=(8, 25), width=0.8)
plt.xscale('log')
plt.grid(True)
plt.tick_params(labelbottom=True, labeltop=True)
plt.xlim(10**0, 1000000)
# annotate the bars
for rect in ax.patches:
# Find where everything is located
height = rect.get_height()
width = rect.get_width()
x = rect.get_x()
y = rect.get_y()
# The width of the bar is the count value and can used as the label
label_text = f'{width:.0f}'
label_x = x + width
label_y = y + height / 2
# don't include label if it's equivalently 0
if width > 0.001:
ax.annotate(label_text, xy=(label_x, label_y), va='center', xytext=(2, -1), textcoords='offset points')
new
and death
are barely visible compared to confirmed
.dfs.plot.barh(stacked=True, figsize=(8, 15))
plt.xscale('log')
Upvotes: 3
Reputation: 784
I had trouble finding info on how to create a GROUPED and STACKED bar chart in matplotlib and later Plotly.
Here is my attempt at solving your problem (using Plotly):
# Import packages
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Load data (I used the raw GitHub link so that no local file download was required)
bigdf = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")
# Get all companies names and number of companies
allComp = np.unique(bigdf.company)
numComp = allCompanies.shape[0]
# For all the companies
for i in range(numComp):
# Grab company data and the names of the establishments for that company
comp = allComp[i]
compData = bigdf.loc[bigdf.company == comp]
estabs = compData.est.to_numpy().astype(str)
numEst = compData.shape[0]
# Grab the new, confirmed, and death values for each of the establishments in that company
newVals = []
confirmedVals = []
deathVals = []
for i in range(numEst):
estabData = compData.loc[compData.est == estabs[i]]
newVals.append(estabData.new.to_numpy()[0])
confirmedVals.append(estabData.confirmed.to_numpy()[0])
deathVals.append(estabData.death.to_numpy()[0])
# Load that data into a Plotly graph object
fig = go.Figure(
data=[
go.Bar(name='New', x=estabs, y=newVals, yaxis='y', offsetgroup=1),
go.Bar(name='Confirmed', x=estabs, y=confirmedVals, yaxis='y', offsetgroup=2),
go.Bar(name='Death', x=estabs, y=deathVals, yaxis='y', offsetgroup=3)
]
)
# Update the layout (add time, set x/y axis titles, and bar graph mode)
fig.update_layout(title='COVID Data for ' + comp, xaxis=dict(type='category'), xaxis_title='Establishment',
yaxis_title='Value', barmode='stack')
fig.show()
where the output is 16 separate Plotly graphs for each company (which are interactable, and you can turn on various traces, as scaling for new/confirmed/death values wasn't so easy). Each plot has all the establishments for that company in the x-axis, and the new/confirmed/death values for each establishment as a stacked bar chart.
I know this doesn't completely answer your question, but I hope you appreciate my effort :)
Upvotes: 2