Reputation: 89
I have some data:
library(ggplot2)
x <-c(2600248.25,1303899.14285714,1370136.33333333,353105.857142857, 145446.952380952,299032,75142.2631578947,40381.1818181818,6133.93103448276,975.234567901235,779.341463414634)
y <- c(4,7,6,14,21,9,19,22,29,81,41)
Which I am trying to make a regression on and plot. My issue is that I want to make a regression and plot it against my data, but when I use lm on the log values, predict and plot I get some different results compared to stat_smooth. Considering the code:
fit0 <- lm(log(y) ~ log(x))
summary(fit0)
newx <- x
lm.fit <- predict(fit0, newdata = data.frame(x=newx), interval = "confidence")
df <- as.data.frame(cbind(x,y,lm.fit))
p <- ggplot(df, aes(x,y)) + geom_point() + geom_smooth(method = "lm", formula ="y~x") + scale_x_log10() + scale_y_log10()
p <- p + geom_line(aes(y=fit)) # result too low
p <- p + geom_line(aes(y=10^fit)) # result too high
As seen, I have tried both with the log result and converting back using 10^x. As is, the two linear models should show the same values? What is wrong here, how do I get the correct values?
(my end goals is to be able to plot prediction intervals)
Upvotes: 2
Views: 1229
Reputation: 2956
You used log10
scale on ggplot
but log
for the calculation. In R only using log()
means you are using the natural logarithm. When you use log10()
instead, you see there is no difference in geom_smooth
and lm
. Since ggplot
is just calling the lm
routine, the output is expected as same.
library(ggplot2)
x <-c(2600248.25,1303899.14285714,1370136.33333333,353105.857142857, 145446.952380952,299032,75142.2631578947,40381.1818181818,6133.93103448276,975.234567901235,779.341463414634)
y <- c(4,7,6,14,21,9,19,22,29,81,41)
fit0 <- lm(log10(y) ~ log10(x))
summary(fit0)
newx <- x
fit <- predict(fit0, newdata = data.frame(x=newx), interval = "confidence")
df <- as.data.frame(cbind(x,y))
p <- ggplot(df, aes(x,y)) + geom_point() + geom_smooth(method = "lm", formula ="y~x") + scale_x_log10() + scale_y_log10()
p <- p + geom_line(aes(y=10^fit[,1]))
p
The black and blue line are overlapping, so it's hard to see. Still, this is the output graph:
For further information, check the documentation.
log
computes logarithms, by default natural logarithms,log10
computes common (i.e., base 10) logarithms, andlog2
computes binary (i.e., base 2) logarithms. The general formlog(x, base)
computes logarithms with base base.
Upvotes: 1
Reputation: 136
Run this code, I hope this will answer your question,
making the model
model=lm(y~x,df)
predicting the y value from the model that we made and assigning to predicted
predicted<-predict(model,newdata = x.df)
making the plot for bothe predicted and the real value of x
p<-ggplot(df, aes(x))+ scale_x_log10()+ geom_smooth(method='lm', aes(y=y), col='red')
this line makes the real value or original plot
adding the predicted point or data to the same graph
p<- p+ geom_smooth(method='lm', aes(y=predicted), col='blue')
Upvotes: 1