Reputation: 457
I am struggling with making a bokeh plot that shows the result of a grouped dataframe. The following is the issue.
I have some data from a dataframe:
data = pd.read_csv('CompanyStructure.csv', index_col = 0)
which looks as the following and contains thousands more rows:
I would like to visualize this dataframe after grouping by thee variables. It could as well be a grouping of two or one variables. Below I have provided an example where I group across all of the three first columns:
grouped = data.groupby(by=['hour', 'Code', 'Type']).sum()
And the frame looks as following:
Now I would like to visualize this. The following is my approach:
source = ColumnDataSource(data=grouped)
p = figure(x_range = source.data['hour_Code_Type'].tolist())
p.vbar(x='hour_Code_Type', top='Value', source=source)
show(p)
Then I get the following error:
ValueError: Unrecognized range input: '[(0, 'DK1', 'A'), (0, 'DK1', 'P'), (0, 'DK1', 'T'), (0, 'DK2', 'A'), (0, 'DK2', 'P'), (0, 'DK2', 'T'), (1, 'DK1', 'A'), (1, 'DK1', 'P'), (1, 'DK1', 'T'), (1, 'DK2', 'A'), (1, 'DK2', 'P'), (1, 'DK2', 'T'), (2, 'DK1', 'A'), (2, 'DK1', 'P'), (2, 'DK1', 'T'), (2, 'DK2', 'A'), (2, 'DK2', 'P'), (2, 'DK2', 'T'), (3, 'DK1', 'A'), (3, 'DK1', 'P'), (3, 'DK1', 'T'), (3, 'DK2', 'A'), (3, 'DK2', 'P'), (3, 'DK2', 'T'), (4, 'DK1', 'A'), (4, 'DK1', 'P'), (4, 'DK1', 'T') ...
I do understand the error, but I simply cannot figure out how to solve this. How can I make the x_range visualize a value as the shown once. My ideal tool is an interactive one (hence why I am using bokeh), which will make a bar chart depending on which variables are choosen to group with.
I hope that someone can help me out.
Upvotes: 0
Views: 1683
Reputation: 46
EDIT: As bigreddot pointed out, FactorRange could be used to avoid having string tuples as categoricals.
from bokeh.models import FactorRange
df['hour'] = df['hour'].astype(str) # To get Tuple(String, String, String) when grouping
grouped = df.groupby(by=['hour', 'Code', 'Type']).sum()
source = ColumnDataSource(data=grouped)
p = figure(x_range=FactorRange(*source.data['hour_Code_Type'].tolist()))
p.vbar(x='hour_Code_Type', top='Value', source=source)
show(p)
This renders
Old answer:
When you group on 'hour', 'Code' and 'Type' you are creating a MultiIndex. As you want categoricals in the x-range to be a list of strings, one approach could be to create a new column that converts the MultiIndex to a string.
grouped = df.groupby(by=['hour', 'code', 'type']).sum()
grouped['group'] = [''.join(str(x)) for x in grouped.index]
This gives this as the dataframe (using mock-data):
value group
hour code type
0 DK2 A 0.4 (0, 'DK2', 'A')
1 DK1 B 1.0 (1, 'DK1', 'B')
2 DK1 A 1.5 (2, 'DK1', 'A')
Then you could visualize value based on the 'group' column:
source = ColumnDataSource(data=grouped)
p = figure(x_range=source.data['group'].tolist())
p.vbar(x='group', top='value', source=grouped)
show(p)
To get:
Upvotes: 1