Nezo
Nezo

Reputation: 597

Python interactive plotting for large data sets

Suppose I have a dataset with 100k rows (1000 different times, 100 different series, an observation for each, and auxilliary information). I'd like to create something like the following: (1) first panel of plot has time on x axis, and average of the different series (and standard error) on y axis. (2) based off the time slice (vertical line) we hover over in panel 1, display a (potentially down sampled) scatter plot of auxilliary information versus the series value at that time slice.

I've looked into a few options for this: (1) matplotlib + ipywidgets doesn't seem to handle it unless you explicitly select points via a slider. This also doesn't translate well to html exporting. This is not ideal, but is potentially workable. (2) altair - this library is pretty sleek, but from my understanding, I need to give it the whole dataset for it to handle the interactions, but it also can't handle more than 5kish data points. This would preclude my use case, correct?

Any suggestions as to how to proceed? Is what I'm asking impossible in the current state of things?

Upvotes: 0

Views: 1603

Answers (1)

joelostblom
joelostblom

Reputation: 48899

You can work with datasets larger than 5k rows in Altair, as specified in this section of the docs.

One of the most convenient solutions in my opinion is to install altair_data_server and then add alt.data_transformers.enable('data_server') on the top of your notebooks and scripts. This server will provide the data to Altair as long as your Python process is running so there is no need to include all the data as part of the created chart specification, which means that the 5k error will be avoided. The main drawback is that it wont work if you export to a standalone HTML because you rely on being in an environment where the server Python process is running.

Upvotes: 1

Related Questions