Remy M
Remy M

Reputation: 619

Plotting an inverse regression curve using ggplot

I am trying to plot two regression lines onto the same scatter plot. It looks like I have it almost right using ggplot. I have one fit, using a second order term and another fit where the inverse of hours is the dependent variable and the inverse of cases is the predictor. The data is as follows:

df <- read.table(textConnection(
  'hours cases
   1275 230
  1350 235
  1650 250
  2000 277
  3750 522
  4222 545
  5018 625
  6125 713
  6200 735
  8150 820
  9975 992
  12200 1322
  12750 1900
  13014 2022
  13275 2155
  '), header = TRUE)

I have the following, but it looks like the inverse regression fit is out of whack. What adjustment could be made to get the correct curve? I know the curve should be concave up and increasing.

ggplot(df, aes(x = cases, y = hours)) +
  geom_point(shape=21, size=3.2,fill="green",color="black")+
  geom_smooth(span=.4,method="lm",formula=y~x+I(x^2))+
  geom_smooth(span=.4,method="lm",formula=I(1/y)~I(1/x))

enter image description here

For reference, just a scatter plot of the predicted value of y against x, where note, the y axis is the inverse of the predicted value of 1/y, we get

enter image description here

The code used to produce this was

fit<-lm(I(1/hours)~I(1/cases),data=df)
summary(fit)

hw <- theme(
  plot.title=element_text(hjust=0.5,face='bold'),
  axis.title.y=element_text(angle=0,vjust=.5,face='bold'),
  axis.title.x=element_text(face='bold'),
  plot.subtitle=element_text(hjust=0.5),
  plot.caption=element_text(hjust=-.5),
  strip.text.y = element_blank(),
  strip.background=element_rect(fill=rgb(.9,.95,1),
                                colour=gray(.5), size=.2),
  panel.border=element_rect(fill=FALSE,colour=gray(.70)),
  panel.grid.minor.y = element_blank(),
  panel.grid.minor.x = element_blank(),
  panel.spacing.x = unit(0.10,"cm"),
  panel.spacing.y = unit(0.05,"cm"),

  axis.ticks=element_blank(),
  axis.text=element_text(colour="black"),
  axis.text.y=element_text(margin=margin(0,3,0,3)),
  axis.text.x=element_text(margin=margin(-1,0,3,0)),
  panel.background = element_rect(fill = "gray")
)

ggplot(df,aes(x=cases,y=1/fitted(fit))) +
  geom_point(shape=21, size=3.2,fill="green",color="black")+
  labs(x="Surgical Cases",
       y="Predicted Worker Hours",
       title="Predicted Worker Hours vs Surgical Cases")+hw

Upvotes: 3

Views: 1351

Answers (2)

Grubbmeister
Grubbmeister

Reputation: 877

As @Roland said, you need to plot the actual model.

But, the problem is that geom_smooth has a formula argument that doesn't like formulas. So, even though the formula below is correct, it doesn't plot the right line.

Using summary(fit) to get a (-0.00005507) and b (0.1743), the intercept and slope of the line:

  geom_smooth(span=.4,method="lm", formula=y~I(1/((1/x)*0.1743-0.00005507)))

Upvotes: 0

Roland
Roland

Reputation: 132864

This should get you started. Including the confidence interval would require additional work (e.g., calculate values for the confidence band outside of ggplot2). I'll leave that as an exercise for the reader.

fit2 <- lm(I(1/hours)~I(1/cases), data = df)

ggplot(df, aes(x = cases, y = hours)) +
  geom_point(shape=21, size=3.2,fill="green",color="black")+
  geom_smooth(span=.4,method="lm",formula=y~x+I(x^2), aes(color = "polyn"))+
  stat_function(fun = function(x) 1 / predict(fit2, newdata = data.frame(cases = x)), 
                aes(color = "inv-inv"), size = 1)

resulting plot

Upvotes: 2

Related Questions