Reputation: 23
I am running coarsened exact matching (CEM) via the package MatchIt as a pre-processing step and want to use the matched data in further analyses. In looking at summary statistics for the matched data, I noticed that means extracted from the matched dataset differ from the MatchIt summary output. For example, using the lalonde dataset:
library(MatchIt)
library(doBy)
data(lalonde)
m.out <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = "cem")
summary(m.out) #Means from MatchIt summary output:
Summary of balance for matched data:
Means Treated Means Control
age 21.5441 21.1781
educ 10.2941 10.3827
black 0.8676 0.8676
hispan 0.0588 0.0588
married 0.0441 0.0441
nodegree 0.6176 0.6176
re74 456.1345 622.8740
re75 350.6728 520.7135
m.dat<-match.data(m.out)
ExtractedMeans<-summaryBy(age+educ+black+hispan+married+nodegree+re74+re75 ~ treat, data = m.dat, FUN=function(x) { c(Mean=mean(x)) } )
ExtractedMeans #Means extracted manually from matched data:
treat 1 0
age.Mean 21.544 19.628
educ.Mean 10.294 9.7179
black.Mean 0.8676 0.60256
hispan.Mean 0.0588 0.10256
married.Mean 0.0441 0.07692
nodegree.Mean 0.6176 0.75641
re74.Mean 456.13 609.61
re75.Mean 350.67 464.22
The means for the control group extracted manually from the matched data are not consistent with the MatchIt summary output. Does anybody know what is going on here? I posted this question to the MatchIt gmane email list last week but have not received a response. Thank you for any help.
Upvotes: 2
Views: 1721
Reputation: 426
The 'doSummary' function is not using the weights. If you multiply the weights by the variable that you want to average, you will get the same average as the package displays. As an example, take your code and do this:
> tapply(m.dat$age, m.dat$treat, mean)
0 1
19.62821 21.54412
> tapply(m.dat$age*m.dat$weights, m.dat$treat, mean)
0 1
21.17811 21.54412
And so, they are equal the MatchIt results...
Upvotes: 2