Reputation: 13
I ran the same models in CEM and and the MatchIt package using the cem method, however, I am unable to get the same number of matched observations. Why is this the case? Shouldn't it be possible to get the same results using the two packages, when using the same specifications? For simplicity, I use the lalonde dataset and only match on three variables and use pre-defined cut-points to make sure these are the same. The MatchIt package produces 429 matches in total, whereas the cem package produces 441 matches in total.
library(tidyverse)
library(cem)
library(MatchIt)
data(lalonde)
lalond2 <- lalonde %>%
select(treat, age, re74, re75, re78)
re74cut <- seq(0, 40000, 5000)
re75cut <- seq(0, max(LL$re75)+1000, by=1000)
agecut <- c(20.5, 25.5, 30.5,35.5,40.5)
my.cutpoints <- list(re75=re75cut, re74=re74cut, age=agecut)
m.out <- matchit(treat ~ age + re74 + re75, data = lalond2,
method = "cem",
cutpoints = my.cutpoints)
c.out <- cem(treatment = "treat",
data = lalond2,
drop = c("treat", "re78"),
cutpoints = my.cutpoints)
m.out
A matchit object
- method: Coarsened exact matching
- number of obs.: 614 (original), 429 (matched)
- target estimand: ATT
- covariates: age, re74, re75
c.out
G0 G1
All 429 185
Matched 277 164
Unmatched 152 21
Upvotes: 1
Views: 553
Reputation: 4414
There are two differences between the implementations of CEM in cem
and MatchIt
. First is a bug in cem
and second is an arbitrary choice that can be fixed. It is possible to get identical results from the two packages, as I'll demonstrate below.
First, cem
has a bug (or just an undesirable feature) whereby any units outside the given cutpoints will be grouped together. For example, any unit with age
less than 20.5 or greater than 40.5 will be placed in the same stratum. Take a look at stratum number 142 and you'll see that this is exactly what happens:
> lalond2[c.out$strata == 142, 2:4]
age re74 re75
NSW134 20 16318.62 1484.994
PSID201 46 19171.43 1317.677
PSID204 45 16559.72 1265.758
If you want to prevent this from happening, you need to define your cutpoints to encompass the entire range of the data, not just the internal cutpoints. A straightforward way to do this is to replace each cutpoint vector, e.g., agecut
, with c(-Inf, agecut, Inf)
. This will correctly bound the upper and lower strata of age
and separate those groups. MatchIt
does this automatically.
The second has to do with how values on the cutpoint borders are treated. In cem
, units on the cutpoint border will be placed into the lower stratum, and in MatchIt
, they will be placed into the upper stratum. To prevent this from happening, choose cutpoint values that no individual has data on. This can be achieved by adding a small constant to each cutpoint value. For example, you can replace agecut
with agecut + .001
. Then there will be no ambiguity and the results between the two packages will align.
So, to wrap up, to ensure the two packages yield the same results, always ensure all units are explicitly bounded within values of the cutpoints
vectors, which can be done by surrounding the desired cutpoints with -Inf
and Inf
, and try to ensure that the cutpoints always fall between variables values and not on them, which can be done by adding a small constant to the desired cutpoints, positive if you want values at the boundary to be in the lower stratum and negative if you want them to be in the upper stratum.
Upvotes: 3