R package MatchIt with factor variables

Question

I'm using the R package MatchIt to calculate propensity score weights to be used into a straightforward survival analysis, and I'm noticing very different behaviors according to the fact that some covariates entering the propensity score calculations are factors or numeric.

An example: simple code for 3 variables, one of which is numeric (size) and two factors (say tumor stage, smoking habits). Treatment variable is a factor indicating the type of surgery.

Example 1: with stage as factor and smoking habit as integer,

> sapply(surg.data[,confounders], class)
tumor_size  TNM.STAGE smoking_hx 
 "numeric"   "factor"  "integer"

I calculate the propensity scores with the following code and extract the weights

data.for.ps = surg.data[,c('record_id','surgeries_combined_n', confounders)]

match.it.1 <- matchit(as.formula(paste0('surgeries_combined_n ~',paste0(confounders, collapse='+'))), 
   data=data.for.ps, method='full', distance='logit')
match.it.1$nn
m.data = match.data(match.it.1)
m.data$weights = match.it.1$weights

No big problems. The result of the corresponding, weighted survival analysis is the following, no matter here what "blue" and "red" means:

Example 2 is exactly the same, but with tumor stage now a numeric

> sapply(surg.data[,confounders], class)
tumor_size  TNM.STAGE smoking_hx 
 "numeric"  "numeric"  "integer"

Exactly the same code for matching, exactly the same code for the survival analysis, the result is the following:

not very different, but different.

Example 3 is exactly the same code, but with both tumor stage and smoking habit factors:

> sapply(surg.data[,confounders], class)
tumor_size  TNM.STAGE smoking_hx 
 "numeric"   "factor"   "factor"

The result, using exactly the same code, is the following:

totally different.

Now, there is no reason why one of the two potential factors should be numeric: they can be both factors, but the results are unquestionably different. Can anybody help me understand

Why this happens? I don't think it's a coding problem, but more of understanding which is the correct class to put into match.it.
Which is the "correct" solution with MatchIt, keeping in mind that in the package vignette all the variables entering the propensity score calculations are numeric or integer, even those potentially coded as factors (such as education level, or marital status).
Should factors stay always factors? What if a factor is coded, say, 0,1,2,3 (numeric values but class=factor): should it stay a factor?

Thank you so much for your help! EM

R package MatchIt with factor variables

Answers (1)

Related Questions