Ron Gejman
Ron Gejman

Reputation: 6215

Color a ggplot background according to a formula

I would like to make a ggplot scatterplot that has a colored background where the color at each point is dictated by the formula color = x * y. On top of this I would plot a bunch of points.

The purpose of the background is to allow the reader to quickly identify which points are "equivalent" because x*y is approximately the same value. I guess this would be accomplished with geom_raster and/or stat_function but I can't quite figure out how to string together the functions. Any insight/tips would be useful and I'll post the final solution.

Here's some skeleton code so you don't have to write an example.

library("ggplot2")

NRPercent <- function(x) {
    paste0(sapply(x * 100, scales::comma), "%")
}

data = data.frame(  count = c(  5e6,5e6,1e6,1e6, ## lots of experiments
                                    5e6,5e6,5e6, #RS22

                                    5e6,5e6,5e6,5e6,5e6, #RS30
                                    5e6,5e6,5e6,5e6, #RS30
                                    5e6,5e6,5e6,5e6,5e6, #RS30
                                    5e6,5e6,5e6,5e6,5e6, #RS30
                                    5e6,5e6,5e6,5e6, #RS30

                                    1e6,1e6,1e6,1e6,1e6, #RS31
                                    5e5,5e5,5e5,5e5,5e5, #RS31
                                    1e5,1e5,1e5,1e5,1e5, #RS31
                                    5e4,5e4,5e4,5e4,5e4 #RS31
                                    ),
                    percent = c(    1,1,1,1,
                                    0.13,0.475,0.83, 

                                    0.1,0.1,0.1,0.1,0.1,  #RS30
                                    0.01,0.01,0.01,0.01, #RS30
                                    0.001,0.001,0.001,0.001,0.001, #RS30
                                    0.0001,0.0001,0.0001,0.0001,0.0001, #RS30
                                    0.00001,0.00001,0.00001,0.00001, #RS30


                                    0.01,0.01,0.01,0.01,0.01,
                                    0.01,0.01,0.01,0.01,0.01,
                                    0.01,0.01,0.01,0.01,0.01,
                                    0.01,0.01,0.01,0.01,0.01
                                    ),
                    label = c(  "On","On","On","On",
                                    "On","On","On",

                                    "Not On","On","On","On","On",
                                    "Not On","On","On","On",
                                    "Not On","Not On","Not On","Not On","Not On",
                                    "Not On","Not On","Not On","Not On","Not On",
                                    "Not On","Not On","Not On","Not On",

                                    "Unknown","Unknown","Unknown","Unknown","Unknown",
                                    "Unknown","Unknown","Unknown","Unknown","Unknown",
                                    "Unknown","Unknown","Unknown","Unknown","Unknown",
                                    "Unknown","Unknown","Unknown","Unknown","Unknown"
                                ))

g = ggplot(data, aes(x=percent, y=count,color=label)) +
    geom_jitter(shape=16,width=0.2, height=0.1) +
    scale_y_continuous(trans='log1p',limits=c(40000,10000000),breaks=c(10e6,5e6,1e6,5e5,1e5,5e4,1e4)) +
    scale_x_continuous(trans='log',labels = NRPercent, expand=c(0,0), breaks=c(0,0.00001,0.0001,0.001,0.01,0.1,0.5)) +
    xlab("Percent")+
    ylab("Number") +
    theme_bw()


pdf("example_percent_vs_number.pdf")
print(g)
dev.off()

Upvotes: 2

Views: 280

Answers (1)

CPak
CPak

Reputation: 13581

You can try geom_raster like this. I used log10(color*percent) to fill

ggplot(data, aes(x=percent, y=count,color=label)) +
    geom_jitter(shape=16,width=0.2, height=0.1) +
    geom_raster(aes(fill=log10(count*percent))) +
    scale_y_continuous(trans='log1p',limits=c(40000,10000000),breaks=c(10e6,5e6,1e6,5e5,1e5,5e4,1e4)) +
    scale_x_continuous(trans='log',labels = NRPercent, expand=c(0,0), breaks=c(0,0.00001,0.0001,0.001,0.01,0.1,0.5)) +
    xlab("Percent")+
    ylab("Number") +
    theme_bw()

or geom_tile

ggplot(data, aes(x=percent, y=count,color=label)) +
    geom_jitter(shape=16,width=0.2, height=0.1) +
    geom_tile(aes(fill=log10(count*percent), x=percent, y=count)) +
    scale_y_continuous(trans='log1p',limits=c(40000,10000000),breaks=c(10e6,5e6,1e6,5e5,1e5,5e4,1e4)) +
    scale_x_continuous(trans='log',labels = NRPercent, expand=c(0,0), breaks=c(0,0.00001,0.0001,0.001,0.01,0.1,0.5)) +
    xlab("Percent")+
    ylab("Number") +
    theme_bw()

You'll need to adjust the width, height, and color scale to your liking (I'd do it but you're using funny axes). See the example below to show adjusting the size is trivial on normal axes

ggplot(mtcars, aes(x=cyl,y=mpg)) + 
  geom_tile(aes(fill=cyl*mpg, x=cyl, y=mpg, width=0.5, height=1)) + 
  geom_point()

How to fill background

Conceptually you need to fill in every point on your plot with a value

X <- seq(min(range(mtcars$cyl)), max(range(mtcars$cyl)), 0.1)
Y <- seq(min(range(mtcars$mpg)), max(range(mtcars$mpg)), 0.1)
SpecDens <- expand.grid(X,Y) %>%
             setNames(c("X","Y")) %>%
             mutate(D=X*Y)
ggplot(SpecDens, aes(X,Y)) + geom_raster(aes(fill=D))

Again this is difficult with your plot since it spans orders-of-magnitude but the above should get you started

Also, you'll need to merge the background-density values with the actual data-points into a single data.frame to plot both.

Upvotes: 1

Related Questions