Matthew Hui
Matthew Hui

Reputation: 664

Cannot overlay multiple stat_function with ggplot2

I have a table with a binning variable VAR2_BY_NS_BIN and an x-y data pair (MP_BIN,CORRECT_PROP). I want to plot the data point binned, and also draw a different line for each bin using stat_function, taking a different reference each time using the for loop.

test_tab <- data.table(VAR2_BY_NS_BIN=c(0.0005478, 0.0005478, 0.002266, 0.002266, 0.006783, 0.006783, 0.020709, 0.020709, 0.142961, 0.142961),
                       MP_BIN=rep(c(0.505, 0.995), 5),
                       CORRECT_PROP=c(0.5082, 0.7496, 0.5024, 0.8627, 0.4878, 0.9368, 0.4979, 0.9826, 0.4811, 0.9989))

VAR2_BIN <- sort(unique(test_tab$VAR2_BY_NS_BIN)) #get unique bin values
LEN_VAR2_BIN <- length(VAR2_BIN) #get number of bins

col_base <- c("#FF0000", "#BB0033", "#880088", "#3300BB", "#0000FF") #mark bins with different colours

p <- ggplot(data = test_tab)

for (i in 1:LEN_VAR2_BIN) {
  p <- p + geom_point(data = test_tab[test_tab$VAR2_BY_NS_BIN==VAR2_BIN[i],],
                      aes(x = MP_BIN, y = CORRECT_PROP),
                      col = col_base[i],
                      alpha = 0.5) +
           stat_function(fun = function(t) {VAR2_BIN[i]*(t-0.5)+0.5}, col = col_base[i])
}

p <- p + xlab("MP") + ylab("Observed proportion")
print(p)

The above code (a reproducible example), however, always returns a plot with only the last stat_function line drawn (which is the 5th line in the above case).

The following code (without using the for loop) works, but I in fact have a large number of bins so it is not very feasible...

p <- p + stat_function(fun = function(t) {VAR2_BIN[1]*(t-0.5)+0.5}, col = col_base[1])
p <- p + stat_function(fun = function(t) {VAR2_BIN[2]*(t-0.5)+0.5}, col = col_base[2])
p <- p + stat_function(fun = function(t) {VAR2_BIN[3]*(t-0.5)+0.5}, col = col_base[3])
p <- p + stat_function(fun = function(t) {VAR2_BIN[4]*(t-0.5)+0.5}, col = col_base[4])
p <- p + stat_function(fun = function(t) {VAR2_BIN[5]*(t-0.5)+0.5}, col = col_base[5])

Thanks in advance!

Upvotes: 2

Views: 419

Answers (1)

eipi10
eipi10

Reputation: 93761

You don't need a for loop or stat_function. To plot the points, just map MP_BIN and CORRECT_PROP to x and y and the points can be plotted with a single call to geom_point. For the lines, you can create the necessary values on the fly (as done in the code below) and plot those with geom_line.

library(tidyverse)

ggplot(test_tab %>% mutate(model=VAR2_BY_NS_BIN*(MP_BIN - 0.5) + 0.5), 
       aes(x=MP_BIN, colour=factor(VAR2_BY_NS_BIN))) +
  geom_point(aes(y=CORRECT_PROP)) +
  geom_line(aes(y=model)) +
  labs(colour="VAR2_BY_NS_BIN") +
  guides(colour=guide_legend(reverse=TRUE))

In terms of the problem you were having with the for loop, what's going on is that ggplot doesn't actually evaluate the loop variable (i) until you print the plot. The value of i is 5 at the end of the loop when the plot is printed, so that's the only line you get. You can find several questions related to this issue on Stack Overflow. Here's one of them.

Upvotes: 3

Related Questions