Fit decision boundary to logistic regression model in R

Question

I'm struggling to plot a decision boundary in R using ggplot.

I have 2 variables (exam scores) and a binary classification whether a student was admitted to school or not. The data looks like below:

> head(exam.data)
  Exam1Score Exam2Score Admitted
1   34.62366   78.02469        0
2   30.28671   43.89500        0
3   35.84741   72.90220        0
4   60.18260   86.30855        1
5   79.03274   75.34438        1
6   45.08328   56.31637        0

I can plot the data using ggplot:

exam.plot <- ggplot(data=exam.data, aes(x=Exam1Score, y=Exam2Score, col = ifelse(Admitted == 1,'dark green','red'), size=0.5))+
  geom_point()+
  labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+
  theme_bw()+
  theme(legend.position="none")

and then successfully fit the logistic regression model:

exam.lm <- glm(data=exam.data, formula=Admitted ~ Exam1Score + Exam2Score, family="binomial")

So after much searching the web, I decided to manually fit the decision boundary (though did try for a while doing this using stat_smooth but couldn't get it to work), I tried the following:

# Fit the decision boundary
plot_x <- c(min(exam.data$Exam1Score)-2, max(exam.data$Exam1Score)+2)
plot_y <- (-1 /coef(exam.lm)[3]) * (coef(exam.lm)[2] * plot_x + coef(exam.lm)[1])
db.data <- data.frame(rbind(plot_x, plot_y))
colnames(db.data) <- c('x','y')

# Add the decision boundary plot
ggplot()+geom_line(data=db.data, aes(x=x, y=y))

which successfully plots the decision boundary, but I can't add it to my existing plot with:

> exam.plot+geom_line(data=db.data, aes(x=x, y=y))
Error: Aesthetics must either be length one, or the same length as the dataProblems:x, y

Can someone point out what I'm doing wrong or whether I can actually do this with +stat_smooth()?

All code (ex2.R) and files are here: https://github.com/StuHorsman/rscripts/tree/master/R/Coursera

Thanks!

Stuart

Update: I can achieve some similar with:

plot(exam.data$Exam1Score, exam.data$Exam2Score, type="n", xlab="Exam 1 Scores", ylab="Exam 2 Scores")      
points(exam.data$Exam1Score[exam.data$Admitted==1], exam.data$Exam2Score[exam.data$Admitted==1], pch=4, col="green")  
points(exam.data$Exam1Score[exam.data$Admitted==0], exam.data$Exam2Score[exam.data$Admitted==0], pch=4, col="red")        
lines(db.data, col="blue")

cbeleites · Accepted Answer

The problem is that in exam.plot you use not only aesthetics x and y, but also col and size (the latter unnecesarily). The layers need to have all aesthetics set that are defined in the ggplot () call. (I've been caught often by that problem).

Thus:

exam.plot+geom_line(data=db.data, aes(x=x, y=y), col = "black", size = 1)

does plot.

However, I'd recommend changing exam.plot a bit and removing all aesthetics that do not apply for all layers (and put them into the layer definition instead):

exam.plot <- ggplot(data=exam.data, aes(x = Exam1Score, y=Exam2Score))+
  geom_point(aes (col = Admitted), size = 0.5)+
  scale_color_manual (values =  c('red', 'dark green')) + 
  labs(x="Exam 1 Scores", y="Exam 2 Scores", title="Exam Scores", colour="Exam Scores")+
  theme_bw()+
  coord_equal () +  # assuming that the scores have the same scale.
  theme(legend.position="none")

exam.plot + geom_line(data=db.data, aes(x=x, y=y))

Which with example data

exam.data <- data.frame (Exam1Score = rnorm (100) + 0:1, 
                         Exam2Score = rnorm (100) + 0:1, 
                         Admitted = factor (rep (0:1, 50)))

yields:
example plot

(plotted with default size, 0.5 would hardly be visible for this example)

Fit decision boundary to logistic regression model in R

Answers (2)

Related Questions