Reputation: 5497
I create a simple scatter plot with ggplot2
and visualise the z-variable with a color:
require(ggplot2)
data = data.frame(x=runif(1000), y=runif(1000), vis=rf(1000, df1=1, df2=3))
qplot(x=x, y=y, data=data, color=vis)
however, this is of course not very informative since the distribution is heavily skewed:
hist(data$vis)
the problem - in my opinion - is the equidistant breaks that creates bins for data that is simply not in the sample.
so here is my question: is there an efficient way of overcoming this problem and creating more breaks where more data is available. or in other words i'm looking for non-linear breaks or non-equidistant braks.
Upvotes: 3
Views: 933
Reputation: 48191
Edit: probably something more similar to what you want:
breaks <- quantile(data$vis)
qplot(x=x, y=y, data = data, color = vis) +
scale_colour_gradientn(breaks = as.vector(breaks), colours =
c("grey", "blue", "red"), values = as.vector(breaks),
oob = identity, rescaler = function(x,...) x, labels = names(breaks))
Old answer: In this case breaks are not what you really want
qplot(x=x, y=y, data=data, color=vis) + scale_colour_gradient(breaks = 1:10 * 10)
Considering amount of data we have
quantile(data$vis, seq(0, 1, 0.1))
0% 10% 20% 30% 40%
9.294095e-07 1.883887e-02 8.059213e-02 1.646752e-01 3.580304e-01
50% 60% 70% 80% 90%
6.055612e-01 9.463869e-01 1.638687e+00 2.686160e+00 5.308239e+00
100%
1.693077e+02
so possibly something like
qplot(x=x, y=y, data=data, color=vis) + scale_colour_gradient(limits = c(0,5))
would be good, here points > 5 are grey. A more complicated solution, which you maybe wanted in the first place would be this.
Upvotes: 3