jk3
jk3

Reputation: 47

Discordant data after propensity match using MatchIt

I am attempting to use MatchIt for a large dataset using the code below

match.it <- matchit(TX_NUM ~ AGE + GENDER + PRIMARY_IND + STATUS_1 + ETHNICITY + RF_RENAL + RF_LIVER + RF_VENT, data = matched, method = "exact")

When I look at the summary, I get this:

summary(match.it)

Summary of Balance for Matched Data:
                                              Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
AGE                                                 10.7627       10.7627               0     1.0018         0        0
GENDERF                                              0.5253        0.5253               0          .         0        0
GENDERM                                              0.4747        0.4747              -0          .         0        0
PRIMARY_INDAcute rejection (re-Tx)                   0.0000        0.0000               0          .         0        0
PRIMARY_INDCAD                                       0.0000        0.0000               0          .         0        0
PRIMARY_INDCAD (re-Tx)                               0.0000        0.0000               0          .         0        0
PRIMARY_INDCHD                                       0.5190        0.5190               0          .         0        0
PRIMARY_INDChronic rejection (re-Tx)                 0.0000        0.0000               0          .         0        0
PRIMARY_INDDCM                                       0.4304        0.4304               0          .         0        0
PRIMARY_INDHCM                                       0.0095        0.0095              -0          .         0        0
PRIMARY_INDHyperacute rejection (re-Tx)              0.0000        0.0000               0          .         0        0
PRIMARY_INDnan                                       0.0000        0.0000               0          .         0        0
PRIMARY_INDNon-specific graft failure (re-Tx)        0.0000        0.0000               0          .         0        0
PRIMARY_INDOther                                     0.0063        0.0063              -0          .         0        0
PRIMARY_INDPrimary graft failure (re-Tx)             0.0000        0.0000               0          .         0        0
PRIMARY_INDRCM                                       0.0348        0.0348              -0          .         0        0
PRIMARY_INDUnknown                                   0.0000        0.0000               0          .         0        0
PRIMARY_INDValvular disease                          0.0000        0.0000               0          .         0        0
STATUS_1                                             0.8766        0.8766               0          .         0        0
ETHNICITY                                            0.1424        0.1424               0          .         0        0
RF_RENAL                                             0.3386        0.3386              -0          .         0        0
RF_LIVER                                             0.2089        0.2089               0          .         0        0
RF_VENT                                              0.0949        0.0949               0          .         0        0

Which I would think yields a perfect match. The Love plot also shows a absolute SDF of 0 across all variables. However, when I put it into a data frame using

matched <- match.data(match.it)
matched <- as.data.frame(matched)

And look at the summary for a category like age, the average doesn't match.

summary(matched$AGE)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.000   1.000   9.000   8.334  14.000  18.000

And if I attempt a t-test (or Chi-squared for the categorical variables) between the control and treated groups, there is a large difference between the two.

Can anyone help explain the discordance between the summary table I get for the match data and the results I'm seeing in the actual data frame and how I might correct them?

Upvotes: 0

Views: 152

Answers (1)

Noah
Noah

Reputation: 4414

Exact matching places units into strata based on the covariate values. The strata are used to compute weights, which, when applied to the sample allow you to adjust for the variables used to match on. You MUST incoprorate the weights into balance assessment and treatment effect estimation for the matching to work. The failure to do this is what led to the errors in this post. summary() automatically includes the weights in computing the balance statistics. To estimate the treatment effect, you can include the weights in a regression of the outcome on the treatment as described in the vignette on estimating effects.

To compute weighted means, you can run weighted.mean(matched$AGE, matched$WEIGHTS), which will indicate the mean in the weighted sample is identical to that in each treatment group.

If you want to do 1:1 matching so that you don't need to incorporate weights into the final analysis, you should instead set methpd = "nearest" and include the exact matching variables in the exact argument. There is no good reason to do this, however; you are just throwing away data.

Upvotes: 0

Related Questions