How to use results from different regression models in a scatterplot built using group_by in R?

Question

I would like to add 2 different regression curves, coming from different models, in a scatter plot. Let's use the example below:

Weight=c(12.6,12.6,16.01,17.3,17.7,10.7,17,10.9,15,14,13.8,14.5,17.3,10.3,12.8,14.5,13.5,14.5,17,14.3,14.8,17.5,2.9,21.4,15.8,40.2,27.3,18.3,10.7,0.7,42.5,1.55,46.7,45.3,15.4,25.6,18.6,11.7,28,35,17,21,41,42,18,33,35,19,30,42,23,44,22)
Increment=c(0.55,0.53,16.53,55.47,80,0.08,41,0.1,6.7,2.2,1.73,3.53,64,0.05,0.71,3.88,1.37,3.8,40,3,26.3,29.7,10.7,35,27.5,60,43,31,21,7.85,63,9.01,67.8,65.8,27,40.1,31.2,22.3,35,21,74,75,12,19,4,20,65,46,9,68,74,57,57)
Id=c(rep("Aa",20),rep("Ga",18),rep("Za",15))
df=data.frame(Id,Weight,Increment)

The scatter plot looks like this:

plot_df <- ggplot(df, aes(x = Weight, y = Increment, color=Id)) + geom_point()

I tested a linear and an exponential regression model and could extract the results following loki's answer there:

linear_df <- df %>% group_by(Id) %>% do(model = glance(lm(Increment ~ Weight,data = .))) %>% unnest(model)
exp_df <- df %>% group_by(Id) %>% do(model = glance(lm(log(Increment) ~ Weight,data = .))) %>% unnest(model)

The linear model fits better for the Ga group, the exponential one for the Aa group, and nothing for the Za one:

> linear_df
# A tibble: 3 x 13
  Id    r.squared adj.r.squared  sigma  statistic  p.value    df logLik    AIC    BIC deviance df.residual  nobs
                                               
1 Aa       0.656         0.637  15.1       34.4   1.50e- 5     1 -81.6  169.   172.   4106.             18    20
2 Ga       1.00          1.00    0.243 104113.    6.10e-32     1   1.01   3.98   6.65    0.942          16    18
3 Za       0.0471       -0.0262 26.7        0.642 4.37e- 1     1 -69.5  145.   147.   9283.             13    15

> exp_df
# A tibble: 3 x 13
  Id    r.squared adj.r.squared  sigma  statistic  p.value    df logLik     AIC    BIC deviance df.residual  nobs
                                                
1 Aa      0.999          0.999  0.0624 24757.     1.05e-29     1  28.2  -50.3   -47.4    0.0700          18    20
2 Ga      0.892          0.885  0.219    132.     3.86e- 9     1   2.87   0.264   2.94   0.766           16    18
3 Za      0.00444       -0.0721 0.941      0.0580 8.14e- 1     1 -19.3   44.6    46.7   11.5             13    15

Now, how can I draw the linear regression line for the Aa group, the exponential regression curve for the Ga group, and no curve for the Za group? There is this, but it applies for different regressions built inside the same model type. How can I combine my different objects?

G. Grothendieck · Accepted Answer

The formula shown below gives the same fitted values as does 3 separate fits for each Id so create the lm objects for each of the two models and then plot the points and the lines for each. The straight solid lines are the linear model and the curved dashed lines are the exponential model.

library(ggplot2)

fm.lin <- lm(Increment ~ Id/Weight + 0, df)
fm.exp <- lm(log(Increment) ~ Id/Weight + 0, df)

df %>%
  ggplot(aes(Weight, Increment, color=Id)) +
  geom_point() +
  geom_line(aes(y = fitted(fm.lin))) +
  geom_line(aes(y = exp(fitted(fm.exp))), lty = 2, lwd = 1)

To only show the Aa fitted lines for the linear model and Ga fitted lines for the exponential model NA out the portions not wanted. In this case we used solid lines for the fitted models.

df %>%
  ggplot(aes(Weight, Increment, color=Id)) +
  geom_point() +
  geom_line(aes(y = ifelse(Id == "Aa", fitted(fm.lin), NA))) +
  geom_line(aes(y = ifelse(Id == "Ga", exp(fitted(fm.exp)), NA)))

Added

Regarding the questions in the comments, the formula used above nests Weight within Id and effectively uses a model matrix which, modulo column order, is a block diagonal matrix whose blocks are the model matrices of the 3 individual models. Look at this to understand it.

model.matrix(fm.lin)

Since this is a single model rather than three models the summary statistics will be pooled. To get separate summary statistics use lmList from the nlme package (which comes with R so it does not have to be installed -- just issue a library statement). The statements below will give objects of class lmList that can be used in place of the ones above as they have a fitted method that will return the same fitted values.

library(nlme)
fm.lin2 <- lmList(Increment ~ Weight | Id, df, pool = FALSE)
fm.exp2 <- lmList(log(Increment) ~ Weight | Id, df, pool = FALSE)

In addition, they can be used to get individual summary statistics. Internally the lmList objects consist of a list of 3 lm objects with attributes in this case so we can extract the summary statistics by extracting the summary statistics from each component.

library(broom)
sapply(fm.lin2, glance)  
sapply(fm.exp2, glance)

One caveat is that common statistical tests between models using different dependent variables, Increment vs. log(Increment), are invalid.

How to use results from different regression models in a scatterplot built using group_by in R?

Answers (2)

Added

Related Questions