Iterator
Iterator

Reputation: 20560

Wrapping R's plot function (or ggplot2) to prevent plotting of large data sets

Rather than ask how to plot big data sets, I want to wrap plot so that code that produces a lot of plots doesn't get hammered when it is plotting a large object. How can I wrap plot with a very simple manner so that all of its functionality is preserved, but first tests to determine whether or not the object being passed is too large?

This code works for very vanilla calls to plot, but it's missing the same generality as plot (see below).

myPlot <- function(x, ...){
    isBad <- any( (length(x) > 10^6) || (object.size(x) > 8*10^6) || (nrow(x) > 10^6) )
    if(is.na(isBad)){isBad = FALSE}
    if(isBad){
        stop("No plots for you!")
    }
    return(plot(x, ...))
}

x = rnorm(1000)
x = rnorm(10^6 + 1)

myPlot(x)

An example where this fails:

x = rnorm(1000)
y = rnorm(1000)
plot(y ~ x)
myPlot(y ~ x)

Is there some easy way to wrap plot to enable this checking of the data to be plotted, while still passing through all of the arguments? If not, then how about ggplot2? I'm an equal opportunity non-plotter. (In the cases where the dataset is large, I will use hexbin, sub-sampling, density plots, etc., but that's not the focus here.)


Note 1: When testing ideas, I recommend testing for size > 100 (or set a variable, e.g. myThreshold <- 1000), rather than versus a size of > 1M - otherwise there will be a lot of pain in hitting the slow plotting. :)

Upvotes: 7

Views: 1203

Answers (1)

Gavin Simpson
Gavin Simpson

Reputation: 174853

The problem you have is that as currently coded, myplot() assumes x is a data object, but then you try to pass it a formula. R's plot() achieves this via methods - when x is a formula, the plot.formula() method gets dispatched to instead of the basic plot.default() method.

You need to do the same:

myplot <- function(x, ...)
    UseMethod("myplot")

myplot.default <- function(x, ....) {
    isBad <- any((length(x) > 10^6) || (object.size(x) > 8*10^6) || 
                    (nrow(x) > 10^6))
    if(is.na(isBad)){isBad = FALSE}
    if(isBad){
        stop("No plots for you!")
    }
    invisible(plot(x, ...))
}

myplot.formula <- function(x, ...) {
    ## code here to process the formula into a data object for plotting
    ....
    myplot.default(processed_x, ...)
}

You can steal code from plot.formula() to use in the code needed to process x into an object. Alternatively, you can roll your own following the standard non-standard evaluation rules (PDF).

Upvotes: 6

Related Questions