David Halley
David Halley

Reputation: 457

bokeh plotting grouped dataframe as bar chart with multiindex

I am struggling with making a bokeh plot that shows the result of a grouped dataframe. The following is the issue.

I have some data from a dataframe:

data = pd.read_csv('CompanyStructure.csv', index_col = 0)

which looks as the following and contains thousands more rows:

enter image description here

I would like to visualize this dataframe after grouping by thee variables. It could as well be a grouping of two or one variables. Below I have provided an example where I group across all of the three first columns:

grouped = data.groupby(by=['hour', 'Code', 'Type']).sum()

And the frame looks as following:

enter image description here

Now I would like to visualize this. The following is my approach:

source = ColumnDataSource(data=grouped)
p = figure(x_range = source.data['hour_Code_Type'].tolist())
p.vbar(x='hour_Code_Type', top='Value', source=source)
show(p)

Then I get the following error:

ValueError: Unrecognized range input: '[(0, 'DK1', 'A'), (0, 'DK1', 'P'), (0, 'DK1', 'T'), (0, 'DK2', 'A'), (0, 'DK2', 'P'), (0, 'DK2', 'T'), (1, 'DK1', 'A'), (1, 'DK1', 'P'), (1, 'DK1', 'T'), (1, 'DK2', 'A'), (1, 'DK2', 'P'), (1, 'DK2', 'T'), (2, 'DK1', 'A'), (2, 'DK1', 'P'), (2, 'DK1', 'T'), (2, 'DK2', 'A'), (2, 'DK2', 'P'), (2, 'DK2', 'T'), (3, 'DK1', 'A'), (3, 'DK1', 'P'), (3, 'DK1', 'T'), (3, 'DK2', 'A'), (3, 'DK2', 'P'), (3, 'DK2', 'T'), (4, 'DK1', 'A'), (4, 'DK1', 'P'), (4, 'DK1', 'T') ...

I do understand the error, but I simply cannot figure out how to solve this. How can I make the x_range visualize a value as the shown once. My ideal tool is an interactive one (hence why I am using bokeh), which will make a bar chart depending on which variables are choosen to group with.

I hope that someone can help me out.

Upvotes: 0

Views: 1683

Answers (1)

taul
taul

Reputation: 46

EDIT: As bigreddot pointed out, FactorRange could be used to avoid having string tuples as categoricals.

from bokeh.models import FactorRange

df['hour'] = df['hour'].astype(str) # To get Tuple(String, String, String) when grouping

grouped = df.groupby(by=['hour', 'Code', 'Type']).sum()
source = ColumnDataSource(data=grouped)

p = figure(x_range=FactorRange(*source.data['hour_Code_Type'].tolist()))
p.vbar(x='hour_Code_Type', top='Value', source=source)
show(p)

This renders

enter image description here

Old answer:

When you group on 'hour', 'Code' and 'Type' you are creating a MultiIndex. As you want categoricals in the x-range to be a list of strings, one approach could be to create a new column that converts the MultiIndex to a string.

grouped = df.groupby(by=['hour', 'code', 'type']).sum()
grouped['group'] = [''.join(str(x)) for x in grouped.index]

This gives this as the dataframe (using mock-data):

                value            group
hour code type
0    DK2  A       0.4  (0, 'DK2', 'A')
1    DK1  B       1.0  (1, 'DK1', 'B')
2    DK1  A       1.5  (2, 'DK1', 'A')

Then you could visualize value based on the 'group' column:

source = ColumnDataSource(data=grouped)

p = figure(x_range=source.data['group'].tolist())
p.vbar(x='group', top='value', source=grouped)
show(p)

To get:

enter image description here

Upvotes: 1

Related Questions