Reputation: 21
I am a new to R. I am trying to fit a regression line in ggplot using geom_smooth(method=lm).
I happen to notice that for the same dataset, the slope of the regression line is changed when the axes are changed from linear-linear to linear-log and to log-log, so much so that the y-intercept is different and the correlation also appears to be untrue.
The correlation co-effecient for the data (r) is -0.125
conc <- c(1.5188717, 1.8794363, 2.5455899, 1.5810686, 0.4938004, 2.9526288)
absp <- c(6.975519, 2.279606, 2.265391, 1.611868, 1.379097, 1.324827)
mydata <- data.frame(conc, absp)
corr_eqn <- function(x, y) {
corr_coef <- round(cor(x, y), digits=3)
paste("italic(r) == ", corr_coef)
}
value <- data.frame(r=corr_eqn(conc, absp))
1. I have first plotted my data onto a linear scale. Here the y-intercept appears to be ~3.1
ggplot(mydata, aes(x=conc, y=absp)) +
labs(x="Conc.", y="Absp.") +
scale_x_continuous() +
scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7)) +
geom_point(size=4) +
theme(text = element_text(size=18, face="bold"), legend.position="none")+
geom_smooth(method=lm, se=FALSE) +
geom_text(data=value, aes(x=2, y=6, label=r),
colour="red", size=5, parse=TRUE)
2. But, when I change the y-axis to log2 scale and the x-axis remains in linear, the y-intercept appears to be between 2 and 3.
ggplot(mydata, aes(x=conc, y=absp)) +
labs(x="Conc.", y="Absp. (log2)") +
scale_x_continuous() +
scale_y_continuous(trans=log2_trans(),breaks=c(0,1,2,3,4,5,6,7)) +
annotation_logticks(sides="l") +
geom_point(size=4) +
theme(text = element_text(size=18, face="bold"), legend.position="none")+
geom_smooth(method=lm, se=FALSE) +
geom_text(data=value, aes(x=2, y=6, label=r),
colour="red", size=5, parse=TRUE)
3. and finally when I change both y and x-axis to log2 scale, the y-intercept drops to less than 2 and the slope now appears to be positive.
ggplot(mydata, aes(x=conc, y=absp)) +
labs(x="Conc. (log2)", y="Absp. (log2)") +
scale_x_continuous(trans=log2_trans()) +
scale_y_continuous(trans=log2_trans(),breaks=c(0,1,2,3,4,5,6,7)) +
annotation_logticks(sides="lb") +
geom_point(size=4) +
theme(text = element_text(size=18, face="bold"), legend.position="none")+
geom_smooth(method=lm, se=FALSE) +
geom_text(data=value, aes(x=2, y=6, label=r),
colour="red", size=5, parse=TRUE)
I do not seem to understand how the slope or y-intercepts can change just by changing of the axis from linear to log scales? Have I committed some obvious error here or am I missing something? I would be grateful for any help/advice.
Upvotes: 2
Views: 1559
Reputation: 21
Thank you @Roland for following this up with Hadley.
I have taken note of your suggestion of using stat_smooth instead of geom_smooth.
Also, I learned the reason for this observed variations in the slope of the regression line in log-linear or log-log plots from here. In this case, when using scale_x_continuous(trans=log2_trans())
or scale_y_continuous(trans=log2_trans())
, the regression analysis is done after the scale and data transformations, but when using coord_trans(x="log2", y="log2")
the regression analysis is performed on the untransformed data first and then plotted to the transformed coordinates.
My modified code now results in the correct value for y-intercept (3.21) and a negative slope (in accordance with negative correlation) for log2-log2 plot.
ggplot(mydata, aes(x=conc, y=absp)) +
labs(x="Conc. (log2)", y="Absp. (log2)") +
coord_trans(x="log2", y="log2") +
geom_point(size=4) +
stat_smooth(method=lm, se=FALSE, fullrange = TRUE) +
theme(text = element_text(size=18, face="bold"), legend.position="none") +
geom_text(data=cor_val, aes(x=2, y=6, label=r),
colour="red", size=5, parse=TRUE)
Upvotes: 0
Reputation: 132706
stat_smooth
fits the model to the transformed data:
2^coef(lm(log2(absp) ~ conc, data = mydata))[1]
#(Intercept)
# 2.405666
ggplot(mydata, aes(x=conc, y=absp)) +
labs(x="Conc.", y="Absp. (log2)") +
scale_x_continuous(limits = c(0, 3)) +
scale_y_continuous(trans=log2_trans(),breaks=c(0,1,2,2.4056,3,4,5,6,7)) +
annotation_logticks(sides="l") +
geom_point(size=4) +
theme(text = element_text(size=18, face="bold"), legend.position="none")+
stat_smooth(method=lm, se=FALSE, fullrange = TRUE) +
geom_text(data=value, aes(x=2, y=6, label=r),
colour="red", size=5, parse=TRUE)
Arguably, this is a bug and should be reported.
Edit: Hadley says it's by design.
Upvotes: 1