Phrixus
Phrixus

Reputation: 1219

Pandas and matplotlib stacked bar chart with major and minor x-ticks grouped together

I have the following data:

id, approach, outcome
a1, approach1, outcome1
a1, approach1, outcome2
a1, approach1, outcome2
a1, approach1, outcome2
a1, approach1, outcome2
a1, approach2, outcome1
a1, approach2, outcome1
a1, approach2, outcome1
a1, approach2, outcome1
a1, approach2, outcome1
a1, approach3, outcome1
a1, approach3, outcome1
a1, approach3, outcome1
a1, approach3, outcome1
a1, approach3, outcome1
a2, approach1, outcome2
a2, approach1, outcome1
a2, approach1, outcome1
a2, approach1, outcome2
a2, approach1, outcome1
a2, approach2, outcome1
a2, approach2, outcome1
a2, approach2, outcome2
a2, approach2, outcome1
a2, approach2, outcome2
a2, approach3, outcome2
a2, approach3, outcome2
a2, approach3, outcome1
a2, approach3, outcome2
a2, approach3, outcome1

I found the following chart from another user which is exactly what I am looking to accomplish: enter image description here

But instead of fruits we have ids and instead of years we have approaches.

Here is what I have done so far:

df = pandas.read_csv("test.txt", sep=r',\s+', engine = "python")
fig, ax = plt.subplots(1, 1, figsize=(5.5, 4))

data = df[df.approach == "approach1"].groupby(["id", "outcome"], sort=False)["outcome"].count().unstack(level=1)
data.plot.bar(width=0.5, position=0.6, color=["g", "r"], stacked=True, ax=ax)

data = df[df.approach == "approach2"].groupby(["id", "outcome"], sort=False)["outcome"].count().unstack(level=1)
data.plot.bar(width=0.5, position=-0.6, color=["g", "r"], stacked=True, ax=ax)

# "Activate" minor ticks
ax.minorticks_on()

rects_locs = []
p = 0
for patch in ax.patches:
    rects_locs.append(patch.get_x() + patch.get_width())
    # p += 0.01

# Set minor ticks there
ax.set_xticks(rects_locs, minor = True)

# Labels for the rectangles
new_ticks = ["Approach1"] * 10 + ["Approach2"] * 10

# Set the labels
from matplotlib import ticker
ax.xaxis.set_minor_formatter(ticker.FixedFormatter(new_ticks))  #add the custom ticks

# Move the category label further from x-axis
ax.tick_params(axis='x', which='major', pad=15)

# Remove minor ticks where not necessary
ax.tick_params(axis='x',which='both', top='off')
ax.tick_params(axis='y',which='both', left='off', right = 'off')
plt.xticks(rotation=0)

But the output is not nice: enter image description here

So basically I want to have id as the major x-tick (so there should be 2 such x values) and then for each id there should be 3 grouped stacked bars (approach1, approach2, approach3).

Upvotes: 0

Views: 350

Answers (1)

jwalton
jwalton

Reputation: 5686

Well, I'm not proud of it. But it works. Hopefully somebody more knowledgeable will come along with a better solution.

I start by setting up your data:

import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import numpy as np
import pandas as pd

data = np.array([
'id', 'approach', 'outcome',
'a1', 'approach1', 'outcome1',
'a1', 'approach1', 'outcome2',
'a1', 'approach1', 'outcome2',
'a1', 'approach1', 'outcome2',
'a1', 'approach1', 'outcome2',
'a1', 'approach2', 'outcome1',
'a1', 'approach2', 'outcome1',
'a1', 'approach2', 'outcome1',
'a1', 'approach2', 'outcome1',
'a1', 'approach2', 'outcome1',
'a1', 'approach3', 'outcome1',
'a1', 'approach3', 'outcome1',
'a1', 'approach3', 'outcome1',
'a1', 'approach3', 'outcome1',
'a1', 'approach3', 'outcome1',
'a2', 'approach1', 'outcome2',
'a2', 'approach1', 'outcome1',
'a2', 'approach1', 'outcome1',
'a2', 'approach1', 'outcome2',
'a2', 'approach1', 'outcome1',
'a2', 'approach2', 'outcome1',
'a2', 'approach2', 'outcome1',
'a2', 'approach2', 'outcome2',
'a2', 'approach2', 'outcome1',
'a2', 'approach2', 'outcome2',
'a2', 'approach3', 'outcome2',
'a2', 'approach3', 'outcome2',
'a2', 'approach3', 'outcome1',
'a2', 'approach3', 'outcome2',
'a2', 'approach3', 'outcome1'])

data = data.reshape(data.size // 3, 3)

df = pd.DataFrame(data[1:], columns=data[0])

Next I count up all the occurrences of "outcome1" and "outcome2" for each approach and id. (I'm sure this could be done directly in pandas, but I'm a bit of a pandas novice):

dict = {}

for id in 'a1', 'a2':
    dict[id] = {}
    for approach in 'approach1', 'approach2', 'approach3':
        dict[id][approach] = {}
        for outcome in 'outcome1', 'outcome2':
            dict[id][approach][outcome] = ((df['id'] == id)
                                         & (df['approach'] == approach)
                                         & (df['outcome'] == outcome)).sum()

plot_data = pd.DataFrame(dict)

Now all that is left is to do the plotting.

fig, ax = plt.subplots(1, 1)

i = 0
for id in 'a1', 'a2':
    for approach in 'approach1', 'approach2', 'approach3':
        ax.bar(i, plot_data[id][approach]["outcome1"], color='g')
        ax.bar(i, plot_data[id][approach]["outcome2"],
               bottom=plot_data[id][approach]["outcome1"], color='r')
        i += 1
    i+=1

ax.set_xticklabels(['', 'approach1', 'approach2', 'approach3', '',
                    'approach1', 'approach2', 'approach3'], rotation=45)

custom_lines = [Line2D([0], [0], color='g', lw=4),
                Line2D([0], [0], color='r', lw=4)]

ax.legend(custom_lines, ['Outcome 1', 'Outcome 2'])

enter image description here

Upvotes: 1

Related Questions