wh41e
wh41e

Reputation: 193

How to create and use a formula in function in r?

I am trying to create a reusable function. In the function, I would like to self-define a formula and then test the formula with the lm in my function and output the summary of the regression results.

I have tried using as.formula function to create my own formula in my self-defined function, but I get error messages with the following codes, not sure why, could anyone help me?

# create the data
x <- c(1,2,3,5,6,7,8,1,1,2,1)
y <- c(2,3,4,5,1,3,4,5,6,7,2)
z <- c(2,3,4,1,2,3,33,5,2,4,5)
i <- c(2,4,4,5,1,3,2,5,6,7,2)
j <- c(2,9,4,1,2,3,4,5,2,4,5)
k <- c(2,12,4,5,1,3,4,5,6,7,2)
q <- c(2,55,4,1,2,5,4,5,2,4,5)
m <- data.frame(x,y,z)

# the function
polyRegress <- function(pre1, pre2, dv, df){

  # This is the formula I want to test:
  # model <- lm(z ~ x + y + I(x^2) + I(x*y) + I(y^2), data=m)

  f <- as.formula(paste0(dv, " ~ ", pre1, " + ", pre2, " + ", "I(", pre1, "^2)", " + ", "I(", pre1, "*", pre2, ")", " + ", "I(", pre2, "^2)")

  results <- lm(f, data=df)
  summary(results)
}

# main
polyRegress(x, y, z, m)
polyRegress(i, j, k, m)

Also, in the outputs from the two polyRegress functions above, I want the names of the coefficients being x, y, I(x^2), I(x * y), I(y^2) and i, j, I(i^2), I(i * j), I(j^2), rather than pre1, pre2, I(pre1^2), I(pre1 * pre2), I(pre2^2)

Upvotes: 1

Views: 2688

Answers (1)

Sylvain Berthelot
Sylvain Berthelot

Reputation: 141


With your example i think you don't need to have df argument because x,y,z,i ... are vectors.
When you call polyRegress(x, y, z, m) you use x,y and z vectors not the colnames in m.
So, in the first case you can use solutions give by using substitute to get argument name with to change coefficient's names.

# create the data
x <- c(1,2,3,5,6,7,8,1,1,2,1)
y <- c(2,3,4,5,1,3,4,5,6,7,2)
z <- c(2,3,4,1,2,3,33,5,2,4,5)
i <- c(2,4,4,5,1,3,2,5,6,7,2)
j <- c(2,9,4,1,2,3,4,5,2,4,5)
k <- c(2,12,4,5,1,3,4,5,6,7,2)
q <- c(2,55,4,1,2,5,4,5,2,4,5)
m <- data.frame(x,y,z)

# the function
polyRegress <- function(pre1, pre2, dv){
  # change pre1 by "x" or "i" ...
  pre1 <- deparse(substitute(pre1))
  pre2 <- deparse(substitute(pre2))
  dv <- deparse(substitute(dv))

  f <- paste0(dv, " ~ ", pre1, " + ", pre2, " + ", "I(", pre1, "^2)", " + ", "I(", pre1, "*", pre2, ")", " + ", "I(", pre2, "^2)")

  results <- lm(f)
  # at this step results$call = lm(formula = f), let's change it !
  results$call <- call('lm', formula = formula(f))
  summary(results)
}

# main
polyRegress(x, y, z)
polyRegress(i, j, k)

But if you really want to call variable in your dataframe you have to change your arguments by character. Beaucause you want to use dataframe's colnames.

# create the data
m <- data.frame(x,y,z,i,j,k)
rm(x,y,z,i,j,k)

# the function
polyRegress <- function(pre1, pre2, dv, df){
  f <- paste0(dv, " ~ ", pre1, " + ", pre2, " + ", "I(", pre1, "^2)", " + ", "I(", pre1, "*", pre2, ")", " + ", "I(", pre2, "^2)")

  results <- lm(f, data = df)
  # at this step results$call = lm(formula = f, data = df), let's change it !
  results$call <- call('lm', formula = formula(f), data = substitute(df)) 
  summary(results)
}

# main
polyRegress("x", "y", "z", m)
polyRegress("i", "j", "k", m)

I hope i understand your demand.

Upvotes: 1

Related Questions