leo1
leo1

Reputation: 21

Variations in regression line slope for log-log ggplot

I am a new to R. I am trying to fit a regression line in ggplot using geom_smooth(method=lm).

I happen to notice that for the same dataset, the slope of the regression line is changed when the axes are changed from linear-linear to linear-log and to log-log, so much so that the y-intercept is different and the correlation also appears to be untrue.

The correlation co-effecient for the data (r) is -0.125

conc <- c(1.5188717, 1.8794363, 2.5455899, 1.5810686, 0.4938004, 2.9526288)
absp <- c(6.975519, 2.279606, 2.265391, 1.611868, 1.379097, 1.324827)
mydata <- data.frame(conc, absp)

corr_eqn <- function(x, y) {
corr_coef <- round(cor(x, y), digits=3)
paste("italic(r) == ", corr_coef)
}
value <- data.frame(r=corr_eqn(conc, absp))

1. I have first plotted my data onto a linear scale. Here the y-intercept appears to be ~3.1

ggplot(mydata, aes(x=conc, y=absp)) +
 labs(x="Conc.", y="Absp.") + 
 scale_x_continuous() +
 scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7)) +
 geom_point(size=4) +
 theme(text = element_text(size=18, face="bold"), legend.position="none")+
 geom_smooth(method=lm, se=FALSE) +
 geom_text(data=value, aes(x=2, y=6, label=r), 
        colour="red", size=5, parse=TRUE)

2. But, when I change the y-axis to log2 scale and the x-axis remains in linear, the y-intercept appears to be between 2 and 3.

ggplot(mydata, aes(x=conc, y=absp)) +
 labs(x="Conc.", y="Absp. (log2)") + 
 scale_x_continuous() +
 scale_y_continuous(trans=log2_trans(),breaks=c(0,1,2,3,4,5,6,7)) +
 annotation_logticks(sides="l") +
 geom_point(size=4) +
 theme(text = element_text(size=18, face="bold"), legend.position="none")+
 geom_smooth(method=lm, se=FALSE) +
 geom_text(data=value, aes(x=2, y=6, label=r), 
            colour="red", size=5, parse=TRUE)

3. and finally when I change both y and x-axis to log2 scale, the y-intercept drops to less than 2 and the slope now appears to be positive.

ggplot(mydata, aes(x=conc, y=absp)) +
 labs(x="Conc. (log2)", y="Absp. (log2)") + 
 scale_x_continuous(trans=log2_trans()) +
 scale_y_continuous(trans=log2_trans(),breaks=c(0,1,2,3,4,5,6,7)) +
 annotation_logticks(sides="lb") +
 geom_point(size=4) +
 theme(text = element_text(size=18, face="bold"), legend.position="none")+
 geom_smooth(method=lm, se=FALSE) +
 geom_text(data=value, aes(x=2, y=6, label=r), 
        colour="red", size=5, parse=TRUE)

Image, Scatter plots

I do not seem to understand how the slope or y-intercepts can change just by changing of the axis from linear to log scales? Have I committed some obvious error here or am I missing something? I would be grateful for any help/advice.

Upvotes: 2

Views: 1559

Answers (2)

leo1
leo1

Reputation: 21

Thank you @Roland for following this up with Hadley.

I have taken note of your suggestion of using stat_smooth instead of geom_smooth.

Also, I learned the reason for this observed variations in the slope of the regression line in log-linear or log-log plots from here. In this case, when using scale_x_continuous(trans=log2_trans()) or scale_y_continuous(trans=log2_trans()), the regression analysis is done after the scale and data transformations, but when using coord_trans(x="log2", y="log2") the regression analysis is performed on the untransformed data first and then plotted to the transformed coordinates.

My modified code now results in the correct value for y-intercept (3.21) and a negative slope (in accordance with negative correlation) for log2-log2 plot.

ggplot(mydata, aes(x=conc, y=absp)) +
 labs(x="Conc. (log2)", y="Absp. (log2)") + 
 coord_trans(x="log2", y="log2") +
 geom_point(size=4) +
 stat_smooth(method=lm, se=FALSE, fullrange = TRUE) +
 theme(text = element_text(size=18, face="bold"), legend.position="none") +
 geom_text(data=cor_val, aes(x=2, y=6, label=r), 
           colour="red", size=5, parse=TRUE)

enter image description here

Upvotes: 0

Roland
Roland

Reputation: 132706

stat_smooth fits the model to the transformed data:

2^coef(lm(log2(absp) ~ conc, data = mydata))[1]
#(Intercept) 
#    2.405666 

ggplot(mydata, aes(x=conc, y=absp)) +
  labs(x="Conc.", y="Absp. (log2)") + 
  scale_x_continuous(limits = c(0, 3)) +
  scale_y_continuous(trans=log2_trans(),breaks=c(0,1,2,2.4056,3,4,5,6,7)) +
  annotation_logticks(sides="l") +
  geom_point(size=4) +
  theme(text = element_text(size=18, face="bold"), legend.position="none")+
  stat_smooth(method=lm, se=FALSE, fullrange = TRUE) +
  geom_text(data=value, aes(x=2, y=6, label=r), 
            colour="red", size=5, parse=TRUE)

resulting plot

Arguably, this is a bug and should be reported.

Edit: Hadley says it's by design.

Upvotes: 1

Related Questions