aripod
aripod

Reputation: 57

Plotting stacked barchart with pandas of multiple columns grouped

I have two dataframes which I need to get the difference and then plot one of them on top of this difference. Here is a minimal example:

import pandas as pd
import matplotlib.pyplot as plt

df1 = pd.DataFrame([[2,5,7,6,7],[4,4,4,4,3],[8,8,7,3,4],[16,10,12,13,16]], columns=["N", "A", "B", "C", "D"])
df2 = pd.DataFrame([[2,1,3,6,5],[4,1,2,3,2],[8,2,2,3,3],[16,8,10,3,11]], columns=["N", "A", "B", "C", "D"])

dfDiff = df1 - df2
dfDiff['N'] = df1['N']

# Individual barchart
colors = ['#6c8ebf', '#82b366', '#F7A01D', '#9876a7']
df1.set_index('N')[["A", "B", "C", "D"]].plot.bar(color=colors)
df2.set_index('N')[["A", "B", "C", "D"]].plot.bar(color=colors)

dfStacked = pd.DataFrame(columns=["N", "A", "A_diff", "B", "B_diff"])
dfStacked["N"] = df2["N"]
dfStacked["A"] = df2["A"]
dfStacked["B"] = df2["B"]
dfStacked["C"] = df2["C"]
dfStacked["D"] = df2["D"]
dfStacked["A_diff"] = dfDiff["A"]
dfStacked["B_diff"] = dfDiff["B"]
dfStacked["C_diff"] = dfDiff["C"]
dfStacked["D_diff"] = dfDiff["D"]

dfStacked.set_index('N').plot.bar(stacked=True)

plt.show()

The dataframes look like this: df1 df2 The thing is that now the new stacked one ends up with everything merged. I want to have "A" stacked with "A_diff", "B", stacked with "B_diff", "C" stacked with "C_diff" and "D" stacked with "D_diff". enter image description here For example, I changed the code to do it with "A" and "A_diff" as dfStacked.set_index('N')[["A", "A_diff"]].plot.bar(stacked=True) which looks correct, but I want A,B,C and D grouped by N like in the first two figures. enter image description here

Do I need a new dataframe for this, like dfStacked? If so, in which form should the content be added? And how can I keep the same colors but add hatch="/" only for the "top" stacked bar?

Would it be better to have the dataframe as below?:

df3 = pd.DataFrame(columns=["N", "Algorithm", "df1", "dfDiff"])
df3.loc[len(df3)] = [2, "A", 20, 10]
df3.loc[len(df3)] = [2, "A", 1, 4]
df3.loc[len(df3)] = [4, "A", 2, 3]
df3.loc[len(df3)] = [4, "A", 3, 4]
df3.loc[len(df3)] = [2, "B", 1, 3]
df3.loc[len(df3)] = [2, "B", 2, 4]
df3.loc[len(df3)] = [4, "B", 3, 3]
df3.loc[len(df3)] = [4, "B", 4, 2]

But how to group them by "N" and "Algorithm"? I mean, each row corresponds to one bar, just they should be grouped by "N" with all the "Algorithms" and the two last columns are the two "parts" of each bar. It would be good that the colors match the first two figures (for the "Algorithms") but the top part of the bar has hatch="/" for example.

Upvotes: 2

Views: 395

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

IIUC, try using position parameter in pd.DataFrame.plot.bar:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df1 = pd.DataFrame([[2,5,7,6,7],[4,4,4,4,3],[8,8,7,3,4],[16,10,12,13,16]], columns=["N", "A", "B", "C", "D"])
df2 = pd.DataFrame([[2,1,3,6,5],[4,1,2,3,2],[8,2,2,3,3],[16,8,10,3,11]], columns=["N", "A", "B", "C", "D"])

dfDiff = df1 - df2
dfDiff['N'] = df1['N']

dfStacked = pd.DataFrame(columns=["N", "A", "A_diff", "B", "B_diff"])
dfStacked["N"] = df2["N"]
dfStacked["A"] = df2["A"]
dfStacked["B"] = df2["B"]
dfStacked["C"] = df2["C"]
dfStacked["D"] = df2["D"]
dfStacked["A_diff"] = dfDiff["A"]
dfStacked["B_diff"] = dfDiff["B"]
dfStacked["C_diff"] = dfDiff["C"]
dfStacked["D_diff"] = dfDiff["D"]

dfStacked = dfStacked.set_index('N')

colors = ['red', 'slateblue', 'lightseagreen', 'orange']
colors_c = ['darkred', 'blue', 'darkgreen', 'darkorange']

ax = dfStacked.filter(like='A').plot.bar(stacked=True, position=2, width=.1, color=[colors[0], colors_c[0]], edgecolor='w', alpha=.8)
dfStacked.filter(like='B').plot.bar(stacked=True, ax=ax, position=1, width=.1, color=[colors[1], colors_c[1]], edgecolor='w', alpha=.8)
dfStacked.filter(like='C').plot.bar(stacked=True, ax=ax, position=0, width=.1, color=[colors[2], colors_c[2]], edgecolor='w', alpha=.8)
dfStacked.filter(like='D').plot.bar(stacked=True, ax=ax, position=-1, width=.1, color=[colors[3], colors_c[3]], edgecolor='w', alpha=.8)
ax.set_xlim(-.5,3.5)

plt.legend(loc='upper center', ncol=4, bbox_to_anchor=(.5, 1.2))
plt.show()

Output:

enter image description here

Upvotes: 1

Vitalizzare
Vitalizzare

Reputation: 7230

I'll start from df1, df2 and get dfStacked in a slightly different way:

import pandas as pd

df1 = pd.DataFrame(
    [
        [2,5,7,6,7],
        [4,4,4,4,3],
        [8,8,7,3,4],
        [16,10,12,13,16]
    ], 
    columns=["N", "A", "B", "C", "D"]
).set_index('N')

df2 = pd.DataFrame(
    [
        [2,1,3,6,5],
        [4,1,2,3,2],
        [8,2,2,3,3],
        [16,8,10,3,11]
    ], 
    columns=["N", "A", "B", "C", "D"]
).set_index('N')

dfStacked = pd.concat(
    [df1, df1-df2], 
    axis=1, 
    keys=['raw','diff']
).reorder_levels([1,0], axis=1)

Now we have this DataFrame:

figure

To draw this data in a bar chart stacked by the first level we could make use of two DataFrame.plot's features - ax and bottom. The first one is the location of the axes where the barplot should be drawn, the second one is for the values where the bottom line of the bars should start. For details run help(plt.bar) to read about bottom and help(pd.DataFrame.plot) to read about ax.

import matplotlib.pyplot as plt
from matplotlib.colors import TABLEAU_COLORS

plt.figure(figsize=(10,7))
ax = plt.gca()

names = dfStacked.columns.levels[0]
n = len(names)
color = iter(TABLEAU_COLORS)
w = 1/(n+2)       # width
h = '/'*5         # hatch for diff values
for i, name in enumerate(names):
    c = next(color)   # color
    p = n/2 - i       # position
    dfStacked[name]['raw'].plot.bar(
        ax=ax, 
        position=p, 
        width=w, 
        color=c,
        label=f'{name} raw'
    )
    dfStacked[name]['diff'].plot.bar(
        ax=ax, 
        bottom=dfStacked[name]['raw'], 
        hatch=h,
        position=p,
        width=w, 
        color=c,
        label=f'{name} diff'
    )

ax.set_xlim([-1, n])
ax.tick_params(axis='x', rotation=0)
ax.legend();

And here's the output:

figure

Upvotes: 3

Related Questions