Deb.M
Deb.M

Reputation: 45

R: plotting geom_line() of lm() prediction values and geometric smooth do not coincide

I have the following data

df <- data.frame(x= c(0,1,10,100,1000,0,1, 10,100,1000,0,1,10,100,1000), 
                 y=c(7,15,135,1132,6459,-3,11,127,1120,6249,-5,13,126,1208,6208))

After making a linear model using the data, I used the model to predict y values from know x values. Stored the predicted y values in a data frame "pred.fits"

fit <- lm(data = df, y ~ x)

pred.fits <- expand.grid(x=seq(1, 2000, length=2001))

pm <- predict(fit, newdata=pred.fits, interval="confidence")

pred.fits$py <- pm[,1]

I plot the data and use both geom_smooth() and geom_line(), they seem to be quite coincident.

ggplot(df, aes(x=x, y=y)) + 
       geom_point() + 
       geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
       geom_line(data=pred.fits, aes(x=x, y=py), size=.2)

enter image description here

However, when I plot the same data, with setting the axes in log scale the two regressions differs drastically.

ggplot(df, aes(x=x, y=y)) + 
       geom_point() + 
       geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
       geom_line(data=pred.fits, aes(x=x, y=py), size=.2) + 
       scale_x_log10() + 
       scale_y_log10()

enter image description here

Am I missing something here?

UPDATE

After @Duck pointed me to correct direction, I was able to get it right. The issue was, I wanted the data to be untransformed, but the axes transformed to log10 scale. This is how I was able to do it.

df2 <- df[df$x>=1,] # remove annoying warning msgs.

fit2 <- lm(data = df2, log10(y) ~ log10(x))

pred.fits2 <- expand.grid(x=seq(10^0, 10^3  , length=200))

pm2 <- predict(fit2, newdata=pred.fits2, interval="confidence")

pred.fits2$py <-  10^pm2[,1] # convert the predicted y values to linear scale

ggplot(df2, aes(x=x, y=y)) + 
geom_point() + 
geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
geom_line(data=pred.fits2, aes(x=x, y=py), size=1.5, linetype = "longdash") + 
scale_x_log10() +
scale_y_log10()

enter image description here

Thanks everyone for your help.

Upvotes: 0

Views: 1853

Answers (1)

Duck
Duck

Reputation: 39595

This code can be useful for your understanding (Thanks to @BWilliams for the valious comment). You want x and y in log scale so if mixing a linear model with different scales can mess everything. If you want to see similar scales it is better if you train a different model with log variables and then plot it also using the proper values. Here an approach where we build a log-log model and then plot (data values as ones or negative have been isolated in a new dataframe df2). Here the code:

First linear model:

library(ggplot2)
#Data
df <- data.frame(x= c(0,1,10,100,1000,0,1, 10,100,1000,0,1,10,100,1000), 
                 y=c(7,15,135,1132,6459,-3,11,127,1120,6249,-5,13,126,1208,6208))

#Model 1 all obs
fit <- lm(data = df, y ~ x)
pred.fits <- expand.grid(x=seq(1, 2000, length=2001))
pm <- predict(fit, newdata=pred.fits, interval="confidence")
pred.fits$py <- pm[,1]
#Plot 1
ggplot(df, aes(x=x, y=y)) + 
  geom_point() + 
  geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
  geom_line(data=pred.fits, aes(x=x, y=py), size=.2)

Output:

enter image description here

Now the sketch for log variables, notice how we use log() across main variables and also how the model is build:

#First remove issue values
df2 <- df[df$x>1,]
#Train a new model
pred.fits2 <- expand.grid(x=seq(1, 2000, length=2001))
fit2 <- lm(data = df2, log(y) ~ log(x))
pm2 <- predict(fit2, newdata=pred.fits2, interval="confidence")
pred.fits2$py <- pm2[,1]
#Plot 2
ggplot(df2, aes(x=log(x), y=log(y))) + 
  geom_point() + 
  geom_smooth(method = lm, formula = y ~ x, se = FALSE, size=1.5) +
  geom_line(data=pred.fits2, aes(x=log(x), y=py), size=.2)

Output:

enter image description here

Upvotes: 1

Related Questions