Ken Lin
Ken Lin

Reputation: 1035

Options for improving interactive Altair line charts with many rows

My requirement is to plot a pandas dataframe with a shape of (50,000, 2) as an interactive line chart. One column contains datetime64[ns], and the other contains floating point integers.

Unfortunately, it appears that with this much data, the interactive chart becomes quite slow to pan-and-zoom. My chart is very basic, and I can observe that once I decrease the # of rows to < ~10,000, the performance becomes noticeably better. I have read the FAQ and understand that the official guidelines doesn't even recommend more than 5,000 rows to begin with. Nevertheless, I am looking for ways to improve performance, and there doesn't seem to be a lot discussion on this

I don't need to display all 50,000 data points at once, but I do want all 50,000 stored in a self-contained .html file (ie, chart).

I am thinking along the lines of "Saving plot to disk", "Using a slider to create a sliding window effect that limits the number of data points that is displayed at once, "Use a more efficient data type", "Changing the appearance of the lines so that it's less intensive to render". Really, anything that might help is fine. For some context, I am comparing the performance against more dedicated visualization software like Matlab, in which I have no problem making an interactive line chart with this much data.

Alternatively, I'm also happy to hear an explanation for why this much data isn't feasible to plot interactively due to constraints in HTML, JSON, Altair, Vega, or whatever else.

Upvotes: 3

Views: 964

Answers (1)

joelostblom
joelostblom

Reputation: 48919

There are undergoing efforts to make Vega more performant (including via WebGL), which you can read about here https://github.com/vega/vega/issues/2619. Until those land, I think you best bet is to zoom into a smaller area, which sound like it could work well since you mentioned not needing to display all points at once. I find that using the data_server backend can also help with some large data slow downs, although not anything rendering related.

import pandas as pd
import numpy as np
import altair as alt


alt.data_transformers.enable('data_server')

N=50000
test_df = pd.DataFrame({'t' : range(0, N, 1),
                        'A' : np.random.randint(0, 100, size=N)})

alt.Chart(test_df).mark_point().encode(
    alt.X('t', scale=alt.Scale(domain=[4000, 6000])),
    alt.Y('A', scale=alt.Scale(domain=[40, 60]))).interactive()

Upvotes: 1

Related Questions