LRO
LRO

Reputation: 89

R difference between stat_smooth and lm (using log) in power regression?

I have some data:

library(ggplot2)    
x <-c(2600248.25,1303899.14285714,1370136.33333333,353105.857142857, 145446.952380952,299032,75142.2631578947,40381.1818181818,6133.93103448276,975.234567901235,779.341463414634)
    y <- c(4,7,6,14,21,9,19,22,29,81,41)

Which I am trying to make a regression on and plot. My issue is that I want to make a regression and plot it against my data, but when I use lm on the log values, predict and plot I get some different results compared to stat_smooth. Considering the code:

    fit0 <- lm(log(y) ~ log(x))
    summary(fit0)

    newx <- x
    lm.fit <- predict(fit0, newdata = data.frame(x=newx), interval = "confidence")
    df <- as.data.frame(cbind(x,y,lm.fit))

    p <- ggplot(df, aes(x,y)) + geom_point() + geom_smooth(method = "lm", formula ="y~x") + scale_x_log10() + scale_y_log10()

p <- p + geom_line(aes(y=fit)) # result too low
p <- p +  geom_line(aes(y=10^fit)) # result too high

As seen, I have tried both with the log result and converting back using 10^x. As is, the two linear models should show the same values? What is wrong here, how do I get the correct values?

(my end goals is to be able to plot prediction intervals)

Upvotes: 2

Views: 1229

Answers (2)

mischva11
mischva11

Reputation: 2956

You used log10 scale on ggplot but log for the calculation. In R only using log() means you are using the natural logarithm. When you use log10() instead, you see there is no difference in geom_smooth and lm. Since ggplot is just calling the lm routine, the output is expected as same.

library(ggplot2)    
x <-c(2600248.25,1303899.14285714,1370136.33333333,353105.857142857, 145446.952380952,299032,75142.2631578947,40381.1818181818,6133.93103448276,975.234567901235,779.341463414634)
y <- c(4,7,6,14,21,9,19,22,29,81,41)

fit0 <- lm(log10(y) ~ log10(x))
summary(fit0)

newx <- x
fit <- predict(fit0, newdata = data.frame(x=newx), interval = "confidence")
df <- as.data.frame(cbind(x,y))

p <- ggplot(df, aes(x,y)) + geom_point() + geom_smooth(method = "lm", formula ="y~x") + scale_x_log10() + scale_y_log10()
p <- p +  geom_line(aes(y=10^fit[,1])) 
p

The black and blue line are overlapping, so it's hard to see. Still, this is the output graph: output


For further information, check the documentation.

log computes logarithms, by default natural logarithms, log10 computes common (i.e., base 10) logarithms, and log2 computes binary (i.e., base 2) logarithms. The general form log(x, base) computes logarithms with base base.

Upvotes: 1

sharmajee499
sharmajee499

Reputation: 136

Run this code, I hope this will answer your question,

making the model

model=lm(y~x,df)

predicting the y value from the model that we made and assigning to predicted

predicted<-predict(model,newdata = x.df)

making the plot for bothe predicted and the real value of x

p<-ggplot(df, aes(x))+ scale_x_log10()+ geom_smooth(method='lm', aes(y=y), col='red')

this line makes the real value or original plot

adding the predicted point or data to the same graph

p<- p+ geom_smooth(method='lm', aes(y=predicted), col='blue')

enter image description here

Upvotes: 1

Related Questions