R difference between stat_smooth and lm (using log) in power regression?

Question

I have some data:

library(ggplot2)    
x <-c(2600248.25,1303899.14285714,1370136.33333333,353105.857142857, 145446.952380952,299032,75142.2631578947,40381.1818181818,6133.93103448276,975.234567901235,779.341463414634)
    y <- c(4,7,6,14,21,9,19,22,29,81,41)

Which I am trying to make a regression on and plot. My issue is that I want to make a regression and plot it against my data, but when I use lm on the log values, predict and plot I get some different results compared to stat_smooth. Considering the code:

    fit0 <- lm(log(y) ~ log(x))
    summary(fit0)

    newx <- x
    lm.fit <- predict(fit0, newdata = data.frame(x=newx), interval = "confidence")
    df <- as.data.frame(cbind(x,y,lm.fit))

    p <- ggplot(df, aes(x,y)) + geom_point() + geom_smooth(method = "lm", formula ="y~x") + scale_x_log10() + scale_y_log10()

p <- p + geom_line(aes(y=fit)) # result too low
p <- p +  geom_line(aes(y=10^fit)) # result too high

As seen, I have tried both with the log result and converting back using 10^x. As is, the two linear models should show the same values? What is wrong here, how do I get the correct values?

(my end goals is to be able to plot prediction intervals)

mischva11 · Accepted Answer

You used log10 scale on ggplot but log for the calculation. In R only using log() means you are using the natural logarithm. When you use log10() instead, you see there is no difference in geom_smooth and lm. Since ggplot is just calling the lm routine, the output is expected as same.

library(ggplot2)    
x <-c(2600248.25,1303899.14285714,1370136.33333333,353105.857142857, 145446.952380952,299032,75142.2631578947,40381.1818181818,6133.93103448276,975.234567901235,779.341463414634)
y <- c(4,7,6,14,21,9,19,22,29,81,41)

fit0 <- lm(log10(y) ~ log10(x))
summary(fit0)

newx <- x
fit <- predict(fit0, newdata = data.frame(x=newx), interval = "confidence")
df <- as.data.frame(cbind(x,y))

p <- ggplot(df, aes(x,y)) + geom_point() + geom_smooth(method = "lm", formula ="y~x") + scale_x_log10() + scale_y_log10()
p <- p +  geom_line(aes(y=10^fit[,1])) 
p

The black and blue line are overlapping, so it's hard to see. Still, this is the output graph:

For further information, check the documentation.

log computes logarithms, by default natural logarithms, log10 computes common (i.e., base 10) logarithms, and log2 computes binary (i.e., base 2) logarithms. The general form log(x, base) computes logarithms with base base.

R difference between stat_smooth and lm (using log) in power regression?

Answers (2)

Related Questions