srz
srz

Reputation: 116

R: Similar plot with big and small data frame

I am trying to find a way to plot data frames of different size using the same function. The data is quite similar to the dfs below. Order of xs is not important.

GetDf <- function(n)
  data.frame(x = seq(1, n), y = rnorm(n, 3.5, 0.5), group = runif(n) > 0.5)

PlotIt <- function(df) {
  p <- ggplot(df) + geom_point(aes(x = x, y = y, colour = group)) +
        expand_limits(y = 1) + expand_limits(y = 5) +
        geom_hline(aes(yintercept = c(2.5, 4.5)), linetype = "dotdash")
  print(p)
}

df1 <- GetDf(1000)
df2 <- GetDf(10000)
df3 <- GetDf(100000)
df4 <- GetDf(1000000)

PlotIt(df1) looks ok, but PlotIt(df2) is already bad. Points overlap. I could set the point size smaller when n is large, but then the plots of df1 - df4 would look radically different. If the size is fixed, then the plot of df3 needs something like size = 0.75, and PlotIt(df1) is bad.

I know there is the library hexbin and geom_hex(), but it doesn't seem to produce what I want. I would like to have groups shown in different colors, hexbin is not good for plotting df1, etc.

What would be the best way to plot at least df1 - df3, preferably also df4, so that the plots would "feel" the same and look good? (I'm sorry about vagueness, but I don't know how to be more specific.)

Upvotes: 1

Views: 339

Answers (2)

srz
srz

Reputation: 116

I followed krlmlr answer, and wrote a function that calculates alpha from the row count of df. Also, choosing a better shape made the plots nicer. override.aes is needed for low alpha values.

PlotIt <- function(df) {
  Alpha <- function(x) pmax(0.1, pmin(1, 2.05 - 0.152 * log(x)))
  p <- ggplot(df) + 
    geom_point(aes(x = x, y = y, colour = group), size = 1.5,
               shape = 1, alpha = Alpha(nrow(df))) +
    expand_limits(y = 1) + expand_limits(y = 5) +
    geom_hline(aes(yintercept = c(2.5, 4.5)), linetype = "dotdash") +
    guides(colour = guide_legend(override.aes = list(alpha = 1)))
  print(p)
}

Plots of df1 - df3 look ok to me (full screen). The question is somewhat similar to Scatterplot with too many points. Differences: same function should apply to big and small data frames, and the order of x's is not important.

Upvotes: 3

krlmlr
krlmlr

Reputation: 25464

I suspect you don't want to trace individual points in a scatter plot of 1000 or more points. Why don't you use a sample?

PlotIt <- function(df) {
  df <- sample.rows(df, 1000, replace=F)
  ...
}

(sample.rows is in my kimisc package).

If you really want to show all points, use an alpha value in geom_point. Be sure to export your plot as raster and not as vector image, it will take ages to render otherwise:

  geom_point(aes(...), alpha=get_reasonable_alpha_value(df))

You'll have to do some experimentation for implementing get_reasonable_alpha_value. It should return a value between 0 (fully transparent) and 1 (opaque).

Perhaps a two-dimensional density estimation will suit you better:

  geom_density2d(...)

Upvotes: 2

Related Questions