Reputation: 115
Problem: I want to make a stacked bar chart where each value of the stack is sorted by maximum (bottom) to minimum.
I have a DataFrame that contains a time series in one direction and 12 different categories in the other, something like this:
bz = ['NO1','NO2','NO3','NO4','NO5','DK1','DK2','FI','SE1','SE2','SE3','SE4']
df = pd.DataFrame(columns = bz, index = range(0,24,1), data=np.random.randint(0,100,size=(24, 12)))
I was unable to sort and plot the values in the same line of code so I reseorted to hard coding each hour and sorting them by highest to lowest value like so:
hour1 = df.loc[:,0].sort_values(ascending = True)
hour2 = df.loc[:,1].sort_values(ascending = True)
hour3 = df.loc[:,2].sort_values(ascending = True)
...
But then I couldn't figure out how to plot them in a stack properly.
Desired outcome:
Each category in bz
is stacked and sorted by value (max at the bottom, min at the top) for each successive hour. Where each x-value is one of the variables hour1,hour2
etc.
Upvotes: 0
Views: 497
Reputation: 9146
I'm not aware of any simple way to do this. That said, I was able to get the desired result (at least what I think your desired result is).
First, I created a numpy array of the sorted indices for each of the rows. Then I looped through the data frame (generally bad practice) and created the stacked bar chart. The stacking is done by first ordering the row according to the sort and computing the cumulative sum, making sure to start at 0 and ignore the last value to set the bottom of each bar. The colors are then set by the ordering, to ensure each column is colored the same every loop.
Because of this "hacky" method, you also need to create a custom legend, which I did by following the matplotlib tutorial on that. I also had to move the legend outside so it didn't block the data, which I did following this answer. For the tick marks, I set them to say the corresponding hour in the format "hourXX"
, where "XX"
is a 0-padded number starting at 01.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import numpy as np
rng = np.random.default_rng(10)
bz = ["NO1", "NO2", "NO3", "NO4", "NO5", "DK1",
"DK2", "FI", "SE1", "SE2", "SE3", "SE4"]
df = pd.DataFrame(columns=bz,
index=range(0, 24, 1),
data=rng.integers(0, 100, size=(24, 12)))
sorted_indices = np.argsort(-df).to_numpy()
cmap = plt.get_cmap("tab20")
fig, ax = plt.subplots()
for index, row in df.iterrows():
row = row.iloc[sorted_indices[index]]
bottoms = np.hstack(([0], np.cumsum(row)[:-1]))
ax.bar(index, row, bottom=bottoms, color=cmap(sorted_indices[index]))
ticks = [f"hour{n:02}" for n in range(1, len(df)+1)]
ax.set_xticks(np.arange(len(df)), ticks, rotation=90)
legend_elements = []
for index, name in enumerate(bz):
legend_elements.append(Patch(facecolor=cmap(index), label=name))
ax.legend(handles=legend_elements, bbox_to_anchor=(1.04, 1), loc="upper left")
fig.tight_layout()
fig.show()
Upvotes: 1
Reputation: 23166
IIUC, you need to create a new DataFrame that contains the sorted columns and then plot:
>>> pd.concat([df[col].sort_values(ignore_index=True) for col in df.columns],axis=1).plot.bar(stacked=True)
Upvotes: 0