Reputation: 445
I'm trying to plot some million data points in R. I'm currently using ggplot2 (but I'm open to suggestions of alternate packages). The problem is that the graph takes too long to render (often upwards of a minute). I'm looking for ways to do this faster -- in real time ideally. I would appreciate any help -- attaching code to the question for clarity.
Creating a (random) data frame with ~500000 data points:
letters <- c("A", "B", "C", "D", "E", "F", "G")
myLetters <- sample(x = letters, size = 100000, replace = T)
direction <- c("x", "y", "z")
factor1 <- sample(x = direction, size = 100000, replace = T)
factor2 <- runif(100000, 0, 20)
factor3 <- runif(100000, 0, 100)
decile <- sample(x = 1:10, size = 100000, replace = T)
new.plot.df <- data.frame(letters = myLetters, factor1 = factor1, factor2 = factor2,
factor3 = factor3, decile = decile)
Now, plotting the data:
color.plot <- ggplot(new.plot.df, aes(x = factor3, y = factor2, color = factor1)) +
geom_point(aes(alpha = factor2)) +
facet_grid(decile ~ letters)
How do I make the rendering faster?
Upvotes: 18
Views: 11056
Reputation: 940
There are two main sources of slowness in R plotting:
Graphical back-end can be altered using appropriate device-opening and backend-selection commands -- for me, this usually helps:
options(bitmapType='cairo') #set the drawing backend, this may speed up PNG rendering
x11(type='cairo') #drawing to X11 window using cairo is the fastest interactive output for me
(X11 is not available on windows and a little confusing in Rstudio, but that's a different story)
Plotting simpler shapes helps quite a lot. ggplot uses some variant of pch=19
or pch=20
by default, which are way too slow because of anti-aliasing. You can usually get about 10x faster rendering by using pch='.'
(which is just a single non-aliased pixel) or pch=16
(which is a small non-aliased circle). That also applies for ggplot with shape='.'
and shape=16
, respectively. If you have a lot of points and set appropriately lower alpha, you'll get the "anti-aliasing" for free.
For me, just switching the graphical backend and setting different point shape improved drawing of 1 million points from around 30 minutes to seconds. 500k data points should be rendered in under a second.
EDIT (Jan 2020): I recently made a library that speeds this up even more: https://github.com/exaexa/scattermore
Upvotes: 21
Reputation: 126
In general there are two strategies that I use for this:
1) As described in the comments, taking a reasonable descriptive sample of your data is not going to affect your plot and you will reduce the number of points to render.
2) One trick that I use is actually to create the object without displaying the plot and instead save the plot into a PNG image. This actually speeds up the process by a lot because when you open the image it's going to be a raster rather than a vectorial image.
Upvotes: 4