PortMadeleineCrumpet
PortMadeleineCrumpet

Reputation: 115

Fitting multiple different regression lines with ggplot

For a very basic demonstration, I'm trying to show that the log transformation linear model is the best one for a given set of data. To demonstrate that I'm looking to compare it to standard lm, square root etc, to show that graphically, the log transform of the linear model fits best as compared to the other 2. The question is, how do create multiple overlapping different lm lines in one plot,? If I could label them that would also be great?

Here is sample true data with starter ggplot

library(tidyverse)
p=runif(100,1,100)
q=6+3*log(p)+rnorm(100)
sample <- data.frame(p,q)
ggplot(data = sample) + 
geom_point(mapping = aes(x = p, y = q)) 

Upvotes: 1

Views: 775

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 227081

This doesn't handle the labeling (you could use annotate() to add labels manually), but:

gg0 <- ggplot(data = sample, aes(x=p, y=q)) +  geom_point()
gg0 + geom_smooth(method="lm", formula=y~x) + 
      geom_smooth(method="lm", formula=y~log(x), colour="red") +
      geom_smooth(method="lm", formula=y~sqrt(x), colour="purple")

Upvotes: 1

erocoar
erocoar

Reputation: 5923

You could compute the lines yourself, e.g. like this:

# Make a tibble containing name of transform and the actual function
transforms <- tibble(Transform = c("log", "sqrt", "linear"),
                     Function = list(log, sqrt, function(x) x))

# Compute the regression coefs and turn it into a tidy table
lm_df <- transforms %>% 
  group_by(Transform) %>%
  group_modify(~ {
    lm(q ~ .x$Function[[1]](p), data = sample) %>%
      broom::tidy() %>%
      select(term, estimate) %>%
      pivot_longer(estimate) %>%
      mutate(Function = .x$Function)
  }) 

> lm_df
# A tibble: 6 x 5
# Groups:   Transform [3]
  Transform term                name       value Function
  <chr>     <chr>               <chr>      <dbl> <list>  
1 linear    (Intercept)         estimate 12.6    <fn>    
2 linear    .x$Function[[1]](p) estimate  0.0834 <fn>    
3 log       (Intercept)         estimate  5.89   <fn>    
4 log       .x$Function[[1]](p) estimate  2.99   <fn>    
5 sqrt      (Intercept)         estimate  9.35   <fn>    
6 sqrt      .x$Function[[1]](p) estimate  1.11   <fn>   

# Evaluate the functions at different x values
lm_df <- lm_df %>%
  pivot_wider(names_from = term, values_from = value) %>%
  rename("Intercept" = `(Intercept)`, "Slope" = `.x$Function[[1]](p)`) %>%
  group_modify(~ {
    tibble(
      y = .x$Intercept + .x$Slope * .x$Function[[1]](seq(0, max(sample$p))),
      x = seq(0, max(sample$p))
    )
  }) 

> lm_df
# A tibble: 300 x 3
# Groups:   Transform [3]
   Transform     y     x
   <chr>     <dbl> <int>
 1 linear     12.6     0
 2 linear     12.7     1
 3 linear     12.8     2
 4 linear     12.9     3
 5 linear     12.9     4
 6 linear     13.0     5
 7 linear     13.1     6
 8 linear     13.2     7
 9 linear     13.3     8
10 linear     13.4     9
# ... with 290 more rows

# Plot the functions
ggplot() + 
  geom_point(data = sample, mapping = aes(x = p, y = q)) +
  geom_line(data = lm_df, aes(x = x, y = y, color = Transform))

enter image description here

Upvotes: 3

Related Questions