Reputation: 312
I'm visualizing a scatterplots with between 400K and 2.5M points. I expectected to need to downsample before visualizing but to see just how much I ran a pilot test with a 400k dataset in plotly express, and the plot popped up quickly, beautifully, and responsively.
In order to make the interractive figure I really need to use plotly.graph_objects, as I need multiple traces with different colorscales, so I made basically the same graph with graph_objects and it wasn't just slower, it crashed my computer.
I'd really like to downsample as little as possible and I'm surprised by the sheer performance difference between these two approaches so I guess that boils down to my question:
Why is there such a performance difference and is it possible to change layout/figure/whatever parameters in graph_objects so to close the gap?
Here is a snippet to show what I mean by basically the same graph:
fig = go.Figure()
fig.add_trace(go.Scatter(x = x_values, y = y_values, opacity = opacity, marker = {
'size': size,
'color': community,
'colorscale': colorscale
}))
pacmap_map = px.scatter(x = x_values, y = y_values, color_continuous_scale=colorscale, opacity = opacity, color = community)
pacmap_map.update_traces(marker = {
'size': size
})
I would have expected performance to either be identical or at least in the same ballpark, but express works like a dream and graph_objects crashes the jupyter kernel and whatever IDE it is running from, so a large difference.
Upvotes: 5
Views: 5341
Reputation: 13185
Running the following simple example:
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
x = np.linspace(-2, 2, 100000)
y = np.cos(x)
fig = go.Figure(data=[go.Scatter(x=x, y=y)])
fig2 = px.scatter(x=x, y=y)
type(fig.data[0]), type(fig2.data[0])
# out: (plotly.graph_objs._scatter.Scatter, plotly.graph_objs._scattergl.Scattergl)
As you can see, plotly express appears to switch to Scattergl
when the number of points is higher than some threshold. Scattergl renders on an html5 canvas, hence it uses the GPU (hence efficiency). Whereas Scatter
creates svg objects that get inserted in the current document, consuming muuuuuch more memory.
Upvotes: 7