Danny Yoon
Danny Yoon

Reputation: 25

Bokeh updating source using aggregate

I'm building Bokeh dashboard with country data to dynamically change graph for a line graph.

User are able to select multiple countries using CheckboxGroup.

I am able to subset source table dynamically as I select/deselect countries.

After I subset, I am aggregating source table for graph where problem accurs. (group all countries by date)

I understand that we have to directly use source=src but I need to aggregate every time I update new source.

Is there any suggestion on how I can approach this issue?

Thanks!

def make_plot(src):
    temp = pd.DataFrame.from_dict(src.data)
    agg_date_full = ColumnDataSource(temp.groupby('date').sum().reset_index())
    fig1.line('date', 'y',source=agg_date_full)

def update(attr, old, new):
    country_to_plot = [country_checkbox.labels[i] for i in country_checkbox.active]
    new_src = make_dataset(country_to_plot)
    src.data.update(new_src.data)

country_checkbox = CheckboxGroup(labels=country_labels, active= list(range(0,len(country_labels))))
country_checkbox.on_change('active', update)

initial_countries = [country_checkbox.labels[i] for i in country_checkbox.active]

src = make_dataset(initial_countries)
    
p = make_plot(src)

Upvotes: 1

Views: 548

Answers (1)

gherka
gherka

Reputation: 1446

The answer depends on how you plan to deploy and use your dashboard.

If you can run a bokeh server then it's fairly straightforward to achieve the dynamic transformation of the data you describe.

We can get an example timeseries dataset with multiple countries from the World Bank using their API. From your description it should be close enough:

http://api.worldbank.org/v2/country/eas;ecs;lcn;mea;nac;sas;ssf/indicator/EN.ATM.CO2E.KT?source=2&downloadformat=csv

After a little bit of tidying up, the dataframe should look like this:

Country Name                Year    Value
East Asia & Pacific         1960    1.215380e+06
Europe & Central Asia       1960    4.583646e+06
Latin America & Caribbean   1960    3.024539e+05
Middle East & North Africa  1960    9.873685e+04
North America               1960    3.083749e+06
...

Now the bokeh code. I've used the single module approach from the docs, but you can make it as complex as you'd like. Note that you should put this code in .py file, not run it from a Jupyter notebook

from bokeh.layouts import row
from bokeh.models import CheckboxGroup, NumeralTickFormatter
from bokeh.plotting import figure, curdoc

initial_x = df["Year"].unique()
initial_y = (
    df[df["Country Name"] == "Europe & Central Asia"]
        .groupby("Year")["Value"]
        .sum()
        .values
)

# create a plot and style its properties
p = figure(height=400, width=600, toolbar_location=None)
p.yaxis[0].formatter = NumeralTickFormatter(format="0.0a")
p.yaxis.axis_label = "CO2 emissions (kt)"
p.xaxis.axis_label = "Years"

# create line renderer
line = p.line(x=initial_x, y=initial_y, line_width=2)

ds = line.data_source

# create a callback that will reset the datasource
def callback(self):

    selected = [checkbox_group.labels[i] for i in checkbox_group.active]
    filtered =  df[df["Country Name"].isin(selected)]
    new_data = dict()
    new_x = filtered["Year"].unique()
    new_y = filtered.groupby("Year")["Value"].sum().values
    new_data["x"] = new_x
    new_data["y"] = new_y

    ds.data = new_data

# add checkboxes and the callback
labels = list(df["Country Name"].unique())
checkbox_group = CheckboxGroup(labels=labels, active=[1])
checkbox_group.on_click(callback)

# put the checkboxes and plot in a layout and add to the document
curdoc().add_root(row(checkbox_group, p))

Now, when you run the following command from your terminal: bokeh serve --show app.py, you will be able to see your dashboard in the browser, like so:

enter image description here

When you click on different regions, their carbon emissions will be added up and plotted as one line.

Upvotes: 1

Related Questions