jbryer
jbryer

Reputation: 1837

Change the name of a variable in a formula

I am trying to create generic function to handle a data frame with multiple plausible values. What I want is to pass a formula to a function to perform a regression such as:

f <- MRPCM ~ DSEX + IEP + ELL3 + SDRACEM + PARED

The MRPCM variable does not actually exist in the data frame. Instead five variables, MRPCM1, MRPCM2, MRPCM3, MRPCM4, and MRPCM5 do exist. What I want to do is iterate and update the formula (f here) to create five formulas. Can this be done? The update.formula function seems to work on the entire left or right side at a time. I should also note that in this example the variable I wish to change is the dependent variable so that update(f, MRPCM1 ~ .) works. However, I will not know where the variable appears in the formula.

For example:

f <- MRPCM + DSEX ~ IEP + ELL3 + SDRACEM + PARED

update.formula(f, as.formula('MRPCM1 ~ .'))

Results in this (note that DSEX is missing now):

MRPCM1 ~ IEP + ELL3 + SDRACEM + PARED

Upvotes: 4

Views: 2956

Answers (1)

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

Here's a demonstration of one approach. A more sophisticated implementation might instead update the fitted linear model (see ?update), but that goes beyond the immediate scope of your question.

## Make a reproducible example!!
df <- 
setNames(as.data.frame(matrix(rnorm(96), ncol=8)), 
         c("MRPCM1","MRPCM2","MRPCM3","DSEX","IEP", "ELL3","SDRACEM","PARED"))

## Construct a template formula
f <- MRPCM ~ DSEX + IEP + ELL3 + SDRACEM + PARED

## Workhorse function
iterlm <- function(formula, data) {
    ## Find columns in data matching pattern on left hand side of formula
    LHSpat <- deparse(formula[[2]])
    LHSvars <- grep(LHSpat, names(data), value = TRUE)
    ## Run through matchded columns, repeatedly updating the formula,
    ## fitting linear model, and extracting whatever results you want. 
    sapply(LHSvars, FUN=function(var) {
        uf <- update.formula(f, as.formula(paste(var, "~ .")))
        coef(lm(uf, df))
    })
}

## Try it
iterlm(f, df)
##                  MRPCM1     MRPCM2      MRPCM3
## (Intercept)  0.71638942 -0.3883355  0.22202700
## DSEX        -0.07048994 -0.7478064  0.62590580
## IEP         -0.22716821 -0.2381982  0.12205780
## ELL3        -0.44492392  0.1720344  0.41251561
## SDRACEM      0.21629235  0.4800773  0.02866802
## PARED        0.07885683 -0.2582598 -0.07996121

Upvotes: 6

Related Questions