Logan Yang
Logan Yang

Reputation: 2584

How to set R formula programmatically

I'm trying to use the R package "ipw" for inverse probability weighting. I have some columns which are named "covar.1", "covar.2", "covar.3"... so I want to have a formula for them. From a previous question I got it to work with glm, matchit and other functions. But with ipw, it doesn't work. It works if I copy and paste manually what the print(f1) outputs at the denominator, so I tried without as.formula but it still doesn't work. To reproduce, run

library(ipw)

betaz <- c(0.75, -0.5,  0.25)
betay <- c(0.5, 1.0, -1.5)

X <- matrix(rnorm(3 * 250), 250)
ps <- pnorm(X %*% betaz)
Z <- rbinom(250, 1, ps)
epsilon <- rnorm(250, 0.0, 0.5)

Y0 <- X %*% betay + epsilon
Y1 <- X %*% betay + 0.5 + epsilon
Y <- Y0 * (1 - Z) + Y1 * Z
df <- data.frame(id = seq(250), covar = X, group = Z, metric = Y)
print(df[1:10,])

cols <- colnames(df)
covars <- cols[grep("covar", colnames(df))]
f <- as.formula(paste('group','~', paste(covars, collapse="+")))
psmodel <- glm(f, family = binomial(), data=df)
pscore <- psmodel$fitted.values

f1 <- as.formula(paste('~', paste(covars, collapse="+")))
print(f1)
weightmodel <- ipwpoint(
  exposure = group, family = "binomial", link = "logit", 
  denominator = f1,
  data = df, trunc = .01
)

With as.formula, it complains about object 'groupf1' not found. Not sure why it's doing such concatenation. Basically I need a way to set f1 dynamically using a variable.

From traceback I see the source code

glm(formula = eval(
  parse(
    text = paste(
      deparse(tempcall$exposure, width.cutoff = 500), 
      deparse(tempcall$denominator, width.cutoff = 500), sep = ""))), 
  family = lf, data = data, na.action = na.fail, ...)

R master help needed please. What form does this denominator want?

Upvotes: 1

Views: 535

Answers (1)

Noah
Noah

Reputation: 4414

ipw is written in such a way that it's very hard to enter a formula dynamically. This was one of the motivations I had to write the WeightIt package, which has the same functionality (in all but a few rare cases). In addition, in my cobalt package, there is the function f.build() which creates a formula from its inputs.

You can replace the last several lines of your code with the following:

f1 <- f.build("group", covars)
w.out <- weightit(f1, data = df, estimand = "ATE")
w.out2 <- trim(w.out, .01, lower = TRUE)

Here, f1 is your formula, created by f.build. This way you can cycle through multiple treatment variables in the first argument. The second argument can be either a vector of names of covariates or the data.frame of the covariates themselves. w.out is the weightit object containing the weights estimated by weightit(). The default is logistic regression, but this can be changed. (I noticed the true treatment propensities were generated using a probit model, which can be requested in weightit with link = "probit".)

It seems like you wanted to truncate the weights at the first and 99th percentiles, which is what trim does. By default, it only trims the highest weights, so I set lower = TRUE to also trim the lower weights. In general, you should check covariate balance and the variability of your weights before trimming in case the untrimmed weights are sufficient. cobalt is designed to assess balance and is compatible with WeightIt. Below is how you could assess balance on a weightit object:

bal.tab(w.out, un = TRUE)

You can compare the trimmed and untrimmed weights too:

bal.tab(f1, data = df, un = TRUE, 
         weights = list(untrimmed = w.out$weights,
                         trimmed = w.out2$weights))

When you're ready to estimate your treatment effect, you can just extract the weights from the weightit object. I use the jtools package to get robust standard errors, which are a must with PS weighting:

w1 <- w.out$weights
jtools::summ(lm(metric ~ group, data = df, weights = w1),
             robust = TRUE, confint = TRUE)

There is a good deal of documentation on WeightIt and cobalt. I hope you find them useful!

Upvotes: 1

Related Questions