user3490622
user3490622

Reputation: 1011

Bokeh scatter plot: is it possible to overlay a line colored by category?

I have a dataframe that details sales of various product categories vs. time. I'd like to make a "line and marker" plot of sales vs. time, per category. To my surprise, this appears to be very difficult in Bokeh.

The scatter plot is easy. But then trying to overplot a line of sales vs. date with the same source (so I can update both scatter and line plots in one go when the source updates) and in such a way that the colors of the line match the colors of the scatter plot markers proves near impossible.

Minimal reproducible example with contrived data:

import pandas as pd

df = pd.DataFrame({'Date':['2020-01-01','2020-01-02','2020-01-01','2020-01-02'],\
                'Product Category':['shoes','shoes','grocery','grocery'],\
              'Sales':[100,180,21,22],'Colors':['red','red','green','green']})

df['Date'] = pd.to_datetime(df['Date'])

from bokeh.io import output_notebook
output_notebook()
from bokeh.io import output_file, show
from bokeh.plotting import figure


source = ColumnDataSource(df)
plot = figure(x_axis_type="datetime", plot_width=800, toolbar_location=None)

plot.scatter(x="Date",y="Sales",size=15, source=source, fill_color="Colors", fill_alpha=0.5, \
         line_color="Colors",legend="Product Category")

for cat in list(set(source.data['Product Category'])):  
    tmp = source.to_df()
    col = tmp[tmp['Product Category']==cat]['Colors'].values[0]                                                                                                          
    plot.line(x="Date",y="Sales",source=source, line_color=col)   

show(plot)

Here's what it looks like, which is clearly wrong:

Here's what I want and don't know how to make:

Can Bokeh not make such plots, where scatter markers and lines have the same color per category, with a legend?

Upvotes: 0

Views: 1985

Answers (2)

mosc9575
mosc9575

Reputation: 6367

The solutions is to group your data. Then you can plot lines for each group.

Minimal Example

import pandas as pd
from bokeh.plotting import figure, show, output_notebook
output_notebook()

df = pd.DataFrame({'Date':['2020-01-01','2020-01-02','2020-01-01','2020-01-02'],
                   'Product Category':['shoes','shoes','grocery','grocery'],
                   'Sales':[100,180,21,22],'Colors':['red','red','green','green']})
df['Date'] = pd.to_datetime(df['Date'])

plot = figure(x_axis_type="datetime", 
              plot_width=400, 
              plot_height=400, 
              toolbar_location=None
             )
plot.scatter(x="Date",
             y="Sales",
             size=15, 
             source=df, 
             fill_color="Colors", 
             fill_alpha=0.5,
             line_color="Colors",
             legend_field="Product Category"
            )

for color in df['Colors'].unique():  
    plot.line(x="Date", y="Sales", source=df[df['Colors']==color], line_color=color)     

show(plot)

Output

plot with lines per groups

Upvotes: 0

syntonym
syntonym

Reputation: 7384

With bokeh it is often helpful to first think about the visualisation you want and then structuring the data source appropriately. You want two lines, on per category, the x axis is time and y axis is the sales. Then a natural way to structure your data source is the following:

df = pd.DataFrame({'Date':['2020-01-01','2020-01-02'],
                'Shoe Sales':[100, 180],
                'Grocery Sales': [21, 22]
              })

from bokeh.io import output_notebook
output_notebook()
from bokeh.io import output_file, show
from bokeh.plotting import figure


source = ColumnDataSource(df)
plot = figure(x_axis_type="datetime", plot_width=800, toolbar_location=None)

categories = ["Shoe Sales", "Grocery Sales"]
colors = {"Shoe Sales": "red", "Grocery Sales": "green"}

for category in categories:
    plot.scatter(x="Date",y=category,size=15, source=source, fill_color=colors[category], legend=category)                                                                                                       
    plot.line(x="Date",y=category,source=source, line_color=colors[category])   

show(plot)

Upvotes: 0

Related Questions