Reputation: 116
I am trying to find a way to plot data frames of different size using the same function. The data is quite similar to the dfs below. Order of xs is not important.
GetDf <- function(n)
data.frame(x = seq(1, n), y = rnorm(n, 3.5, 0.5), group = runif(n) > 0.5)
PlotIt <- function(df) {
p <- ggplot(df) + geom_point(aes(x = x, y = y, colour = group)) +
expand_limits(y = 1) + expand_limits(y = 5) +
geom_hline(aes(yintercept = c(2.5, 4.5)), linetype = "dotdash")
print(p)
}
df1 <- GetDf(1000)
df2 <- GetDf(10000)
df3 <- GetDf(100000)
df4 <- GetDf(1000000)
PlotIt(df1) looks ok, but PlotIt(df2) is already bad. Points overlap. I could set the point size smaller when n is large, but then the plots of df1 - df4 would look radically different. If the size is fixed, then the plot of df3 needs something like size = 0.75, and PlotIt(df1) is bad.
I know there is the library hexbin and geom_hex(), but it doesn't seem to produce what I want. I would like to have groups shown in different colors, hexbin is not good for plotting df1, etc.
What would be the best way to plot at least df1 - df3, preferably also df4, so that the plots would "feel" the same and look good? (I'm sorry about vagueness, but I don't know how to be more specific.)
Upvotes: 1
Views: 339
Reputation: 116
I followed krlmlr answer, and wrote a function that calculates alpha from the row count of df. Also, choosing a better shape made the plots nicer. override.aes is needed for low alpha values.
PlotIt <- function(df) {
Alpha <- function(x) pmax(0.1, pmin(1, 2.05 - 0.152 * log(x)))
p <- ggplot(df) +
geom_point(aes(x = x, y = y, colour = group), size = 1.5,
shape = 1, alpha = Alpha(nrow(df))) +
expand_limits(y = 1) + expand_limits(y = 5) +
geom_hline(aes(yintercept = c(2.5, 4.5)), linetype = "dotdash") +
guides(colour = guide_legend(override.aes = list(alpha = 1)))
print(p)
}
Plots of df1 - df3 look ok to me (full screen). The question is somewhat similar to Scatterplot with too many points. Differences: same function should apply to big and small data frames, and the order of x's is not important.
Upvotes: 3
Reputation: 25464
I suspect you don't want to trace individual points in a scatter plot of 1000 or more points. Why don't you use a sample?
PlotIt <- function(df) {
df <- sample.rows(df, 1000, replace=F)
...
}
(sample.rows
is in my kimisc
package).
If you really want to show all points, use an alpha
value in geom_point
. Be sure to export your plot as raster and not as vector image, it will take ages to render otherwise:
geom_point(aes(...), alpha=get_reasonable_alpha_value(df))
You'll have to do some experimentation for implementing get_reasonable_alpha_value
. It should return a value between 0 (fully transparent) and 1 (opaque).
Perhaps a two-dimensional density estimation will suit you better:
geom_density2d(...)
Upvotes: 2