Tom
Tom

Reputation: 115

stacked bar chart sorted by max to min values

Problem: I want to make a stacked bar chart where each value of the stack is sorted by maximum (bottom) to minimum.

I have a DataFrame that contains a time series in one direction and 12 different categories in the other, something like this:

bz =  ['NO1','NO2','NO3','NO4','NO5','DK1','DK2','FI','SE1','SE2','SE3','SE4']
df = pd.DataFrame(columns = bz, index = range(0,24,1), data=np.random.randint(0,100,size=(24, 12)))

I was unable to sort and plot the values in the same line of code so I reseorted to hard coding each hour and sorting them by highest to lowest value like so:

hour1 = df.loc[:,0].sort_values(ascending = True)
hour2 = df.loc[:,1].sort_values(ascending = True)
hour3 = df.loc[:,2].sort_values(ascending = True)
...

But then I couldn't figure out how to plot them in a stack properly.

Desired outcome:

Each category in bz is stacked and sorted by value (max at the bottom, min at the top) for each successive hour. Where each x-value is one of the variables hour1,hour2 etc.

Upvotes: 0

Views: 497

Answers (2)

jared
jared

Reputation: 9146

I'm not aware of any simple way to do this. That said, I was able to get the desired result (at least what I think your desired result is).

First, I created a numpy array of the sorted indices for each of the rows. Then I looped through the data frame (generally bad practice) and created the stacked bar chart. The stacking is done by first ordering the row according to the sort and computing the cumulative sum, making sure to start at 0 and ignore the last value to set the bottom of each bar. The colors are then set by the ordering, to ensure each column is colored the same every loop.

Because of this "hacky" method, you also need to create a custom legend, which I did by following the matplotlib tutorial on that. I also had to move the legend outside so it didn't block the data, which I did following this answer. For the tick marks, I set them to say the corresponding hour in the format "hourXX", where "XX" is a 0-padded number starting at 01.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import numpy as np

rng = np.random.default_rng(10)

bz = ["NO1", "NO2", "NO3", "NO4", "NO5", "DK1",
      "DK2", "FI", "SE1", "SE2", "SE3", "SE4"]
df = pd.DataFrame(columns=bz,
                  index=range(0, 24, 1),
                  data=rng.integers(0, 100, size=(24, 12)))
sorted_indices = np.argsort(-df).to_numpy()

cmap = plt.get_cmap("tab20")
fig, ax = plt.subplots()
for index, row in df.iterrows():
    row = row.iloc[sorted_indices[index]]
    bottoms = np.hstack(([0], np.cumsum(row)[:-1]))
    ax.bar(index, row, bottom=bottoms, color=cmap(sorted_indices[index]))

ticks = [f"hour{n:02}" for n in range(1, len(df)+1)]
ax.set_xticks(np.arange(len(df)), ticks, rotation=90)

legend_elements = []
for index, name in enumerate(bz):
    legend_elements.append(Patch(facecolor=cmap(index), label=name))

ax.legend(handles=legend_elements, bbox_to_anchor=(1.04, 1), loc="upper left")
fig.tight_layout()
fig.show()

Upvotes: 1

not_speshal
not_speshal

Reputation: 23166

IIUC, you need to create a new DataFrame that contains the sorted columns and then plot:

>>> pd.concat([df[col].sort_values(ignore_index=True) for col in df.columns],axis=1).plot.bar(stacked=True)

enter image description here

Upvotes: 0

Related Questions