Adam Robinsson
Adam Robinsson

Reputation: 1751

R: Obtain the matched data set from Matching package - not that easy

I'm trying to obtain the matched data set from a propensity score match, using the Matching package. It works well when i do 1-to-1 matching, but not when trying 1-to-2 matching.

Here goes the code:

> require(Matching)
> data(lalonde)
> # Estimate the propensity model
> glm1  <- glm(treat~age + I(age^2) + educ + I(educ^2) + black +
+                   hisp + married + nodegr + re74  + I(re74^2) + re75 + I(re75^2) +
+                   u74 + u75, family=binomial, data=lalonde)
> 
> #save data objects
> X  <- glm1$fitted
> Y  <- lalonde$re78
> Tr  <- lalonde$treat
> 
> # one-to-two matching with replacement
> rr  <- Match(Y=NULL, Tr=Tr, X=X, M=2, ties=F, caliper=0.01);
> summary(rr)

Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  445 
Original number of treated obs...............  185 
Matched number of observations...............  97 
Matched number of observations  (unweighted).  194 

Caliper (SDs)........................................   0.01 
Number of obs dropped by 'exact' or 'caliper'  88 

> 
> #Obtain the matched data set
> matched <- rbind(lalonde[rr$index.treated,], lalonde[rr$index.control,])
> 
> nrow(matched)
[1] 388

I've tried various ways to solve this. My aim is to match one treated to two controls and then obtain just these individuals from the entire data set. I've searched the web and the package authors documentation without any success. Unfortunately all examples I found so far either do 1:1 matching or do not use matching.

I'd really appreciate some help.

Upvotes: 4

Views: 2663

Answers (1)

tchakravarty
tchakravarty

Reputation: 10964

This is actually fairly straightforward to reconstruct, if you note that the values in index.treated are repeated M number of times, for those treated cases for which it is possible to find matches within the caliper distance.

So, in your case, the the first two elements of index.control are the index numbers of the cases which are mapped to the first two elements of index.treated. You can retrieve the entire list and organize it for one row per treated case as follows:

dfTC = data.frame(idxTreated = rr$index.treated, idxControl = rr$index.control,
                  numControl = factor(rep(1:2), labels = paste0("Control", 1:2)))
dfTCWide = reshape2::dcast(dfTC, idxTreated ~ numControl,
                           value.var = "idxControl")

You can check that this works:

> head(dfTCWide)
  idxTreated Control1 Control2
1          1      271      386
2          3      216      259
3          4      254      359
4          5      230      255
5          6      188      220
6          8      242      279

Upvotes: 3

Related Questions