Differing results using CEM and MatchIt R packages

Question

I ran the same models in CEM and and the MatchIt package using the cem method, however, I am unable to get the same number of matched observations. Why is this the case? Shouldn't it be possible to get the same results using the two packages, when using the same specifications? For simplicity, I use the lalonde dataset and only match on three variables and use pre-defined cut-points to make sure these are the same. The MatchIt package produces 429 matches in total, whereas the cem package produces 441 matches in total.

library(tidyverse)
library(cem)
library(MatchIt)

data(lalonde)

lalond2 <- lalonde %>% 
  select(treat, age, re74, re75, re78)


re74cut <- seq(0, 40000, 5000)
re75cut <- seq(0, max(LL$re75)+1000, by=1000)
agecut <- c(20.5, 25.5, 30.5,35.5,40.5)
my.cutpoints <- list(re75=re75cut, re74=re74cut, age=agecut)


m.out <- matchit(treat ~ age + re74 + re75, data = lalond2, 
                 method = "cem",
                 cutpoints = my.cutpoints)

c.out <- cem(treatment = "treat",
             data = lalond2,
             drop = c("treat", "re78"),
             cutpoints = my.cutpoints)
             
m.out
A matchit object
 - method: Coarsened exact matching
 - number of obs.: 614 (original), 429 (matched)
 - target estimand: ATT
 - covariates: age, re74, re75
 
c.out
           G0  G1
All       429 185
Matched   277 164
Unmatched 152  21

Noah · Accepted Answer

There are two differences between the implementations of CEM in cem and MatchIt. First is a bug in cem and second is an arbitrary choice that can be fixed. It is possible to get identical results from the two packages, as I'll demonstrate below.

First, cem has a bug (or just an undesirable feature) whereby any units outside the given cutpoints will be grouped together. For example, any unit with age less than 20.5 or greater than 40.5 will be placed in the same stratum. Take a look at stratum number 142 and you'll see that this is exactly what happens:

> lalond2[c.out$strata == 142, 2:4]
        age     re74     re75
NSW134   20 16318.62 1484.994
PSID201  46 19171.43 1317.677
PSID204  45 16559.72 1265.758

If you want to prevent this from happening, you need to define your cutpoints to encompass the entire range of the data, not just the internal cutpoints. A straightforward way to do this is to replace each cutpoint vector, e.g., agecut, with c(-Inf, agecut, Inf). This will correctly bound the upper and lower strata of age and separate those groups. MatchIt does this automatically.

The second has to do with how values on the cutpoint borders are treated. In cem, units on the cutpoint border will be placed into the lower stratum, and in MatchIt, they will be placed into the upper stratum. To prevent this from happening, choose cutpoint values that no individual has data on. This can be achieved by adding a small constant to each cutpoint value. For example, you can replace agecut with agecut + .001. Then there will be no ambiguity and the results between the two packages will align.

So, to wrap up, to ensure the two packages yield the same results, always ensure all units are explicitly bounded within values of the cutpoints vectors, which can be done by surrounding the desired cutpoints with -Inf and Inf, and try to ensure that the cutpoints always fall between variables values and not on them, which can be done by adding a small constant to the desired cutpoints, positive if you want values at the boundary to be in the lower stratum and negative if you want them to be in the upper stratum.

Differing results using CEM and MatchIt R packages

Answers (1)

Related Questions