user3051065
user3051065

Reputation: 411

Extract Formula From lm with Coefficients (R)

I have an lm object and want to get the formula extracted with coefficients. I know how to extract the formula without coefficients, and how to get the coefficients without the formula, but not how to get eg. y ~ 10 + 1.25b as opposed to y~b or a table of what intercept, b etc. equal

This is the code I'm working with currently:

a = c(1, 2, 5)
b = c(12, 15, 20)

model = lm(a~b)
summary(model)
formula = formula(model)
formula
coefficients(model)

What I'd like to get from the above is y ~ -5.326 + .51b

Thanks

Edit: In my actual code I'm working with over 63 predictors and 18 different models, so I'd like something that can scale up without too much work.

Upvotes: 7

Views: 11032

Answers (4)

AlexB
AlexB

Reputation: 3269

An alternative with mixing broom and dplyr libraries:

    get_formula <- function(model) {
  
  broom::tidy(model)[, 1:2] %>%
    mutate(sign = ifelse(sign(estimate) == 1, ' + ', ' - ')) %>% #coeff signs
    mutate_if(is.numeric, ~ abs(round(., 2))) %>% #for improving formatting
    mutate(a = ifelse(term == '(Intercept)', paste0('y ~ ', estimate), paste0(sign, estimate, ' * ', term))) %>%
    summarise(formula = paste(a, collapse = '')) %>%
    as.character
  
}

It works for both simple and multiple linear regressions:

model1 <- lm(hp ~ disp, data = mtcars)
model2 <- lm(hp ~ mpg, data = mtcars)
model3 <- lm(hp ~ disp + mpg, data = mtcars)

# > get_formula(model1)
# [1] "y ~ 45.73 + 0.44 * disp"
# > get_formula(model2)
# [1] "y ~ 324.08 - 8.83 * mpg"
# > get_formula(model3)
# [1] "y ~ 172.22 + 0.26 * disp - 4.27 * mpg"

Upvotes: 0

Scratch&#39;N&#39;Purr
Scratch&#39;N&#39;Purr

Reputation: 10399

I figured out a versatile way of creating the model formula with coefficients using substitution. It's a lot more versatile than manually building a string with paste0.

e.g.

I have a model that already has the optimized coefficients:

> model
Nonlinear regression model
  model: players ~ pop * (decay^days_from_start) + ycept
   data: data
      pop     decay     ycept 
6.896e+06 2.633e-01 4.300e+05 
 residual sum-of-squares: 1.64e+08

Number of iterations to convergence: 12 
Achieved convergence tolerance: 1.49e-08

These were the coefficients:

> coef(model)
         pop        decay        ycept 
6.896421e+06 2.632545e-01 4.300453e+05 

Putting it all together:

> newFormula = as.formula(substituteDirect(formula(model), as.list(coef(model))))
> newFormula
players ~ 6896421.4399627 * (0.263254460933212^days_from_start) + 430045.26142703

Upvotes: 3

maverik
maverik

Reputation: 786

Might I suggest an edit to lukeA's excellent answer:

as.formula(
  paste0("y ~ ", round(coefficients(model)[1],2), "", 
    paste(sprintf(" %+.2f*%s ", 
                  coefficients(model)[-1],  
                  names(coefficients(model)[-1])), 
          collapse="")
  )
)

This will make sure that negative coefficients are printed correctly

Suppose you land up with a negative coefficient for b, then the output will be

# y ~ -5.33 + -0.51 * b

instead of

# y ~ -5.33 - 0.51 * b

Upvotes: 10

lukeA
lukeA

Reputation: 54237

as.formula(
  paste0("y ~ ", round(coefficients(model)[1],2), " + ", 
    paste(sprintf("%.2f * %s", 
                  coefficients(model)[-1],  
                  names(coefficients(model)[-1])), 
          collapse=" + ")
  )
)
# y ~ -5.33 + 0.51 * b

Upvotes: 12

Related Questions