Reputation: 2584
I'm trying to use the R package "ipw" for inverse probability weighting. I have some columns which are named "covar.1", "covar.2", "covar.3"... so I want to have a formula for them. From a previous question I got it to work with glm
, matchit
and other functions. But with ipw, it doesn't work. It works if I copy and paste manually what the print(f1)
outputs at the denominator, so I tried without as.formula
but it still doesn't work. To reproduce, run
library(ipw)
betaz <- c(0.75, -0.5, 0.25)
betay <- c(0.5, 1.0, -1.5)
X <- matrix(rnorm(3 * 250), 250)
ps <- pnorm(X %*% betaz)
Z <- rbinom(250, 1, ps)
epsilon <- rnorm(250, 0.0, 0.5)
Y0 <- X %*% betay + epsilon
Y1 <- X %*% betay + 0.5 + epsilon
Y <- Y0 * (1 - Z) + Y1 * Z
df <- data.frame(id = seq(250), covar = X, group = Z, metric = Y)
print(df[1:10,])
cols <- colnames(df)
covars <- cols[grep("covar", colnames(df))]
f <- as.formula(paste('group','~', paste(covars, collapse="+")))
psmodel <- glm(f, family = binomial(), data=df)
pscore <- psmodel$fitted.values
f1 <- as.formula(paste('~', paste(covars, collapse="+")))
print(f1)
weightmodel <- ipwpoint(
exposure = group, family = "binomial", link = "logit",
denominator = f1,
data = df, trunc = .01
)
With as.formula
, it complains about object 'groupf1' not found
. Not sure why it's doing such concatenation. Basically I need a way to set f1 dynamically using a variable.
From traceback I see the source code
glm(formula = eval(
parse(
text = paste(
deparse(tempcall$exposure, width.cutoff = 500),
deparse(tempcall$denominator, width.cutoff = 500), sep = ""))),
family = lf, data = data, na.action = na.fail, ...)
R master help needed please. What form does this denominator want?
Upvotes: 1
Views: 535
Reputation: 4414
ipw
is written in such a way that it's very hard to enter a formula dynamically. This was one of the motivations I had to write the WeightIt
package, which has the same functionality (in all but a few rare cases). In addition, in my cobalt
package, there is the function f.build()
which creates a formula from its inputs.
You can replace the last several lines of your code with the following:
f1 <- f.build("group", covars)
w.out <- weightit(f1, data = df, estimand = "ATE")
w.out2 <- trim(w.out, .01, lower = TRUE)
Here, f1
is your formula, created by f.build
. This way you can cycle through multiple treatment variables in the first argument. The second argument can be either a vector of names of covariates or the data.frame of the covariates themselves. w.out
is the weightit
object containing the weights estimated by weightit()
. The default is logistic regression, but this can be changed. (I noticed the true treatment propensities were generated using a probit model, which can be requested in weightit
with link = "probit"
.)
It seems like you wanted to truncate the weights at the first and 99th percentiles, which is what trim
does. By default, it only trims the highest weights, so I set lower = TRUE
to also trim the lower weights. In general, you should check covariate balance and the variability of your weights before trimming in case the untrimmed weights are sufficient. cobalt
is designed to assess balance and is compatible with WeightIt
. Below is how you could assess balance on a weightit
object:
bal.tab(w.out, un = TRUE)
You can compare the trimmed and untrimmed weights too:
bal.tab(f1, data = df, un = TRUE,
weights = list(untrimmed = w.out$weights,
trimmed = w.out2$weights))
When you're ready to estimate your treatment effect, you can just extract the weights from the weightit
object. I use the jtools
package to get robust standard errors, which are a must with PS weighting:
w1 <- w.out$weights
jtools::summ(lm(metric ~ group, data = df, weights = w1),
robust = TRUE, confint = TRUE)
There is a good deal of documentation on WeightIt
and cobalt
. I hope you find them useful!
Upvotes: 1