Stefan Mehren
Stefan Mehren

Reputation: 35

"BokehUserWarning: ColumnDataSource's columns must be of the same length"

I am trying to plot a fairly big amount of data reaching back all the way to the year 1998.

My code seems to work fine, but when run throws the error message "BokehUserWarning: ColumnDataSource's columns must be of the same length"

Here is my code:

import pandas as pd
from bokeh.io import show, output_file, gridplot
from bokeh.plotting import figure

#Create dataframe
df = pd.read_csv('/Users/macbook/Desktop/source.tab', names=[
'#','datesent','total','place'] delimiter='\t', header=None, encoding="ISO-8859-1")

#Format date
df['datesent'] = pd.to_datetime(df['datesent'], dayfirst=True)

#Datamunging   
transactionssent = dict(pd.melt(df,value_vars=['datesent']).groupby('value').size())        
transactionssent_dataframe = pd.DataFrame.from_dict(transactionssent, orient= 'index')     
transactionssent_dataframe.columns = ['Number of sent transactions']                           
transactionssent_dataframe.index.rename('Date of sending', inplace= True)                         

#X- and Y-axis
x = pd.bdate_range('2017-1-1', '2200-1-1')
y = transactionssent_dataframe['Number of sent transactions']

#Bokeh object
ts = figure(x_axis_type="datetime")

#Show plot
ts.line(x, y)

output_file('/Users/macbook/Desktop/plot.html')

All the output is actually as expected. What does the error mean? Do I really have to create a ColumndDataSource object from the dataframe? I thought passing a pandas dataframe directly to a bokeh plotting function is a good way to get the graph I wanted. Is there a best practice to creating a bokeh plot from a pandas dateframe?

Upvotes: 2

Views: 8690

Answers (2)

horseshoe
horseshoe

Reputation: 1477

This answer is not directly related to the question but refers to the same warning: If you change the length of the ColumnDataSource in an interactive plot you will get the same warning if you change it step by step, e.g. your data source is:

source = ColumnDataSource(
    data=dict(
        x=list(np.zeros(10)),
        y=list(np.ones(10)),
    )
)
p1 = plot.line(x='x', y='y', source=source, line_alpha=1, color="red")

and the data you want to update have a length of e.g. 8. You could do:

p1.data_source.data['x'] = list(np.zeros(8))
p1.data_source.data['y'] = list(np.ones(8))

which will produce the same warning as stated above. To avoid the warning use a dict to set the values:

p1.data_source.data = {'x': list(np.zeros(8)),
                       'y': list(np.ones(8))}

Upvotes: 7

Luke Canavan
Luke Canavan

Reputation: 2137

I assume the validation error comes from the length of your x and y series being different. The output is probably cutting off the overhanging section of the longer array, if that makes sense.

You don't "have to" create a ColumnDataSource manually (one is created internally when you pass arrays to a glyph method like line), but it has some validation stuff that helps prevent this situation.

You can create a ColumnDataSource directly from a dataframe via:

source = ColumnDataSource(dataframe)
ts.line(x='x', y='y', source=source)

Upvotes: 6

Related Questions