Reputation: 619
I am trying to plot two regression lines onto the same scatter plot. It looks like I have it almost right using ggplot. I have one fit, using a second order term and another fit where the inverse of hours is the dependent variable and the inverse of cases is the predictor. The data is as follows:
df <- read.table(textConnection(
'hours cases
1275 230
1350 235
1650 250
2000 277
3750 522
4222 545
5018 625
6125 713
6200 735
8150 820
9975 992
12200 1322
12750 1900
13014 2022
13275 2155
'), header = TRUE)
I have the following, but it looks like the inverse regression fit is out of whack. What adjustment could be made to get the correct curve? I know the curve should be concave up and increasing.
ggplot(df, aes(x = cases, y = hours)) +
geom_point(shape=21, size=3.2,fill="green",color="black")+
geom_smooth(span=.4,method="lm",formula=y~x+I(x^2))+
geom_smooth(span=.4,method="lm",formula=I(1/y)~I(1/x))
For reference, just a scatter plot of the predicted value of y against x, where note, the y axis is the inverse of the predicted value of 1/y, we get
The code used to produce this was
fit<-lm(I(1/hours)~I(1/cases),data=df)
summary(fit)
hw <- theme(
plot.title=element_text(hjust=0.5,face='bold'),
axis.title.y=element_text(angle=0,vjust=.5,face='bold'),
axis.title.x=element_text(face='bold'),
plot.subtitle=element_text(hjust=0.5),
plot.caption=element_text(hjust=-.5),
strip.text.y = element_blank(),
strip.background=element_rect(fill=rgb(.9,.95,1),
colour=gray(.5), size=.2),
panel.border=element_rect(fill=FALSE,colour=gray(.70)),
panel.grid.minor.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.spacing.x = unit(0.10,"cm"),
panel.spacing.y = unit(0.05,"cm"),
axis.ticks=element_blank(),
axis.text=element_text(colour="black"),
axis.text.y=element_text(margin=margin(0,3,0,3)),
axis.text.x=element_text(margin=margin(-1,0,3,0)),
panel.background = element_rect(fill = "gray")
)
ggplot(df,aes(x=cases,y=1/fitted(fit))) +
geom_point(shape=21, size=3.2,fill="green",color="black")+
labs(x="Surgical Cases",
y="Predicted Worker Hours",
title="Predicted Worker Hours vs Surgical Cases")+hw
Upvotes: 3
Views: 1351
Reputation: 877
As @Roland said, you need to plot the actual model.
But, the problem is that geom_smooth has a formula argument that doesn't like formulas. So, even though the formula below is correct, it doesn't plot the right line.
Using summary(fit) to get a (-0.00005507) and b (0.1743), the intercept and slope of the line:
geom_smooth(span=.4,method="lm", formula=y~I(1/((1/x)*0.1743-0.00005507)))
Upvotes: 0
Reputation: 132864
This should get you started. Including the confidence interval would require additional work (e.g., calculate values for the confidence band outside of ggplot2). I'll leave that as an exercise for the reader.
fit2 <- lm(I(1/hours)~I(1/cases), data = df)
ggplot(df, aes(x = cases, y = hours)) +
geom_point(shape=21, size=3.2,fill="green",color="black")+
geom_smooth(span=.4,method="lm",formula=y~x+I(x^2), aes(color = "polyn"))+
stat_function(fun = function(x) 1 / predict(fit2, newdata = data.frame(cases = x)),
aes(color = "inv-inv"), size = 1)
Upvotes: 2