S.K.
S.K.

Reputation: 365

Histogram with slider filter

I would like to create a histogram with a density plot combined in bokeh with a slider filter. Atm, I have the blocks to create a bokeh histogram with a density plot from another thread. I dont know how to create the callback function to update the data and rerender the plot.

from bokeh.io import output_file, show
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg as df

from numpy import histogram, linspace
from scipy.stats.kde import gaussian_kde

pdf = gaussian_kde(df.hp)

x = linspace(0,250,50)

p = figure(plot_height=300)
p.line(x, pdf(x))

# plot actual hist for comparison
hist, edges = histogram(df.hp, density=True, bins=20)
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], alpha=0.4)

show(p)

Upvotes: 0

Views: 1513

Answers (1)

Alex
Alex

Reputation: 579

There are two ways to implement callbacks in Bokeh:

  • with JS code. In that case, the plot remains a standalone object, the constraint being you need to do any data manipulation within Javascript (there is a small caveat to that statement but not relevant here: scipy can't be called from such a callback)
  • by having the callback executed in Bokeh server, in which case you have the full arsenal of python available to you. The cost being, there's a bit more to plotting and distributing the graph than in the first case (but it's not difficult, see example).

Considering you need to refit the kde each time you change the filter condition, the second way is the only option (unless you want to do that in javascript...).

That's how you would do it (example with a filter on cyl):

from bokeh.application import Application
from bokeh.application.handlers import FunctionHandler
from bokeh.io import output_notebook, show
from bokeh.layouts import column
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, Select
from bokeh.sampledata.autompg import autompg as df

from numpy import histogram, linspace
from scipy.stats.kde import gaussian_kde

output_notebook()

def modify_doc(doc):
    x = linspace(0,250,50)

    source_hist = ColumnDataSource({'top': [], 'left': [], 'right': []})
    source_kde = ColumnDataSource({'x': [], 'y': []})

    p = figure(plot_height=300)
    p.line(x='x', y='y', source=source_kde)
    p.quad(top='top', bottom=0, left='left', right='right', alpha=0.4, source=source_hist)

    def update(attr, old, new):
        if new == 'All':
            filtered_df = df
        else:
            condition = df.cyl == int(new)
            filtered_df = df[condition]

        hist, edges = histogram(filtered_df.hp, density=True, bins=20)
        pdf = gaussian_kde(filtered_df.hp)

        source_hist.data = {'top': hist, 'left': edges[:-1], 'right': edges[1:]}
        source_kde.data = {'x': x, 'y': pdf(x)}

    update(None, None, 'All')

    select = Select(title='# cyl', value='All', options=['All'] + [str(i) for i in df.cyl.unique()])
    select.on_change('value', update)
    doc.add_root(column(select, p))

# To run it in the notebook:
plot = Application(FunctionHandler(modify_doc))
show(plot)

# Or to run it stand-alone with `bokeh serve --show myapp.py`
# in which case you need to remove the `output_notebook()` call
# from bokeh.io import curdoc
# modify_doc(curdoc())

A few notes:

  • this is made to be run in jupyter notebook (see the output_notebook() and the last uncommented two lines).
  • to run it outside, comment the notebook lines (see above) and uncomment the last two lines. Then you can run it from the command line.
  • Select will only handle str values so you need to convert in (when creating it) and out (when using the values: old and new)
  • for multiple filters, you need to access the state of each Select at the same time. You do that by instantiating the Selects before defining the update function (but without any callbacks, yet!) and keeping a reference to them, access their value with your_ref.value and build your condition with that. After the update definition, you can then attach the callback on each Select.

Finally, an example with multiple selects:

def modify_doc(doc):
    x = linspace(0,250,50)

    source_hist = ColumnDataSource({'top': [], 'left': [], 'right': []})
    source_kde = ColumnDataSource({'x': [], 'y': []})

    p = figure(plot_height=300)
    p.line(x='x', y='y', source=source_kde)
    p.quad(top='top', bottom=0, left='left', right='right', alpha=0.4, source=source_hist)
    select_cyl = Select(title='# cyl', value='All', options=['All'] + [str(i) for i in df.cyl.unique()])
    select_ori = Select(title='origin', value='All', options=['All'] + [str(i) for i in df.origin.unique()])

    def update(attr, old, new):
        all = pd.Series(True, index=df.index)
        if select_cyl.value == 'All':
            cond_cyl = all
        else:
            cond_cyl = df.cyl == int(select_cyl.value)
        if select_ori.value == 'All':
            cond_ori = all
        else:
            cond_ori = df.origin == int(select_ori.value)
        filtered_df = df[cond_cyl & cond_ori]

        hist, edges = histogram(filtered_df.hp, density=True, bins=20)
        pdf = gaussian_kde(filtered_df.hp)

        source_hist.data = {'top': hist, 'left': edges[:-1], 'right': edges[1:]}
        source_kde.data = {'x': x, 'y': pdf(x)}

    update(None, None, 'All')

    select_ori.on_change('value', update)
    select_cyl.on_change('value', update)

    doc.add_root(column(select_ori, select_cyl, p))

Upvotes: 1

Related Questions