Reputation: 35
I am trying to plot a fairly big amount of data reaching back all the way to the year 1998.
My code seems to work fine, but when run throws the error message "BokehUserWarning: ColumnDataSource's columns must be of the same length"
Here is my code:
import pandas as pd
from bokeh.io import show, output_file, gridplot
from bokeh.plotting import figure
#Create dataframe
df = pd.read_csv('/Users/macbook/Desktop/source.tab', names=[
'#','datesent','total','place'] delimiter='\t', header=None, encoding="ISO-8859-1")
#Format date
df['datesent'] = pd.to_datetime(df['datesent'], dayfirst=True)
#Datamunging
transactionssent = dict(pd.melt(df,value_vars=['datesent']).groupby('value').size())
transactionssent_dataframe = pd.DataFrame.from_dict(transactionssent, orient= 'index')
transactionssent_dataframe.columns = ['Number of sent transactions']
transactionssent_dataframe.index.rename('Date of sending', inplace= True)
#X- and Y-axis
x = pd.bdate_range('2017-1-1', '2200-1-1')
y = transactionssent_dataframe['Number of sent transactions']
#Bokeh object
ts = figure(x_axis_type="datetime")
#Show plot
ts.line(x, y)
output_file('/Users/macbook/Desktop/plot.html')
All the output is actually as expected. What does the error mean? Do I really have to create a ColumndDataSource object from the dataframe? I thought passing a pandas dataframe directly to a bokeh plotting function is a good way to get the graph I wanted. Is there a best practice to creating a bokeh plot from a pandas dateframe?
Upvotes: 2
Views: 8690
Reputation: 1477
This answer is not directly related to the question but refers to the same warning: If you change the length of the ColumnDataSource in an interactive plot you will get the same warning if you change it step by step, e.g. your data source is:
source = ColumnDataSource(
data=dict(
x=list(np.zeros(10)),
y=list(np.ones(10)),
)
)
p1 = plot.line(x='x', y='y', source=source, line_alpha=1, color="red")
and the data you want to update have a length of e.g. 8. You could do:
p1.data_source.data['x'] = list(np.zeros(8))
p1.data_source.data['y'] = list(np.ones(8))
which will produce the same warning as stated above. To avoid the warning use a dict to set the values:
p1.data_source.data = {'x': list(np.zeros(8)),
'y': list(np.ones(8))}
Upvotes: 7
Reputation: 2137
I assume the validation error comes from the length of your x
and y
series being different. The output is probably cutting off the overhanging section of the longer array, if that makes sense.
You don't "have to" create a ColumnDataSource manually (one is created internally when you pass arrays to a glyph method like line
), but it has some validation stuff that helps prevent this situation.
You can create a ColumnDataSource directly from a dataframe via:
source = ColumnDataSource(dataframe)
ts.line(x='x', y='y', source=source)
Upvotes: 6