Reputation: 767
how to detach height of the stacked bars from colors of the fill?
I have multiple categories which I want to present in stacked bar chart so that the height represent the value and color is conditionally defined by another variable (something like fill= in the ggplot ).
I am new to bokeh and struggling with the stack bar chart mechanics. I tried construct this type of chart, but I haven't got anything except all sorts of errors. The examples of stacked bar chart are very limited in the bokeh documentation.
My Data is stored in pandas dataframe:
data =
['A',1, 15, 1]
'A',2, 14, 2
'A',3, 60, 1
'B',1, 15, 2
'B',2, 25, 2
'B',3, 20, 1
'C',1, 15, 1
'C',2, 25, 1
'C',3, 55, 2
...
]
Columns represent Category, Regime, Value, State.
I want to plot Category on x axis, Regimes stacked on y axis where bar length represents Value and color represents State.
is this achievable in bokeh? can anybody demonstrate please
Upvotes: 2
Views: 2929
Reputation: 7384
I think this problem becomes much easier if you transform your data to the following form:
from bokeh.plotting import figure
from bokeh.io import show
from bokeh.transform import stack, factor_cmap
import pandas as pd
df = pd.DataFrame({
"Category": ["a", "b"],
"Regime1_Value": [1, 4],
"Regime1_State": ["A", "B"],
"Regime2_Value": [2, 5],
"Regime2_State": ["B", "B"],
"Regime3_Value": [3, 6],
"Regime3_State": ["B", "A"]})
p = figure(x_range=["a", "b"])
p.vbar_stack(["Regime1_Value", "Regime2_Value", "Regime3_Value"],
x="Category",
fill_color=[
factor_cmap(state, palette=["red", "green"], factors=["A", "B"])
for state in ["Regime1_State","Regime2_State", "Regime3_State"]],
line_color="black",
width=0.9,
source=df)
show(p)
This is a bit strange, because vbar_stack
behaves unlike a "normal glyph". Normally you have three options for attributes of a renderer (assume we want to plot n dots/rectangles/shapes/things:
source[column_name]
must produce an "array" of length n)But vbar_stack
does not create one renderer, it creates as many as there are elements in the first array you give. Lets call this number k. Then to make sense of the attributes you have again three options:
So p.vbar(x=[a,b,c])
and p.vbar_stacked(x=[a,b,c])
actually do different things (the first gives literal data, the second gives column names) which confused, and it's not clear from the documentation.
But why do we have to transform your data so strangely? Lets unroll vbar_stack
and write it on our own (details left out for brevity):
plotted_regimes = []
for regime in regimes: if not plotted_regimes: bottom = 0 else: bottom = stack(*plotted_regimes) p.vbar(bottom=bottom, top=stack(*plotted_regimes, regime)) plotted_regimes.append(regime)
So for each regime we have a separate vbar that has its bottom where the sum of the other regimes ended. Now with the original data structure this is not really possible because there doesn't need to be a a value for each regime for each category. Here we are forced to set these values to 0 if we actually want.
Because the stacked values corrospond to column names we have to put these values in one dataframe. The vbar_stack
call in the beginning could also be written with stack
(basically because vbar_stack
is a convenience wrapper around stack
).
The factor_cmap
is used so that we don't have to manually assign colors. We could also simply add a Regime1_Color
column, but this way the mapping is done automatically (and client side).
Upvotes: 1