Reputation: 1
As a part of my project, I am trying to implement a PSM-DiD model, using the MatchIt package in R. I am very new to this and, therefore, unable to figure out where exactly I am going wrong with the code, as I am getting wrong results for the propensity scores, after running the initial Logit model, and subsequently, the matching model.
A brief description of my data:
Variables: Under14 (treated term; 1 for age < 14, 0 otherwise), Post1986 (post-term; 1 for year <= 1986, 0 otherwise), LIT (outcome), additional covariates - AGE, SEX, FAMSIZE, NCHILD, URBAN, YEAR, STATE (last 2 variables are not very relevant).
From what I understand, propensity scores are conditional probabilities, so this model should have given the output (propensity variable) correctly. However, the propensity scores are nearly 0 for the treated units (2.22e-16) and 1 for the control units, which doesn't make sense (to me).
This is the code that I have used for calculating propensity scores:
# Step 1: Assign groups (Treated or Control) based on the 'Under14' variable
temp$group <- with(temp, ifelse(Under14 == 1, "Treated", "Control"))
temp$group <- factor(temp$group, levels = c("Treated", "Control"))
# Step 2: Fit a logistic regression model to estimate propensity scores
logit_model <- glm(group ~ URBAN + AGE + SEX + NCHILD + FAMSIZE, family = binomial, data = temp)
summary(logit_model)
# Step 3: Calculate predicted probabilities (propensity scores)
temp$propensity <- predict(logit_model, type = "response")
And did this for the matching model:
# Step 4: Perform matching using propensity scores with Nearest neighbor matching
psm <- matchit(group ~ URBAN + AGE + SEX + NCHILD + FAMSIZE, method = "nearest", distance = temp$propensity, data = temp)
# Step 5: Extract matched data
matched_data <- match.data(psm)
# Step 6: Fit the DiD model on matched data
# Group: Treated or Control
# Post1986: Post-treatment indicator
# Interaction term: group * Post1986
did_model <- lm(LIT ~ group * Post1986, data = matched_data, weights = weights)
summary(did_model)
This is the DiD output (the coefficient says groupControl, for some reason I don't understand, that is, the treated one is missing):
It gives the weights as 1 for all units; hence, the final DiD model does not give the desired output. The 0 and 1 values are also present in the final dataset, matched data. I am confused about what could be wrong with my approach. If anyone could point out the mistakes, it would be very helpful.
Additionally, I also want to make the kernel density plots for the before and after matching of propensity scores, something like this:
Any help with this would also be greatly appreciated. I understand my question is very long but I am completely stuck. Thank you!
Upvotes: 0
Views: 50
Reputation: 4424
Assuming Under14
is related to AGE
, this is because AGE
perfectly predicts group
. When you have perfect prediction, you will have propensity scores of 0 or 1. You might have seen that if you examined the propensity score model and balance on the covariates and propensity score prior to matching as instructed in the MatchIt
vignettes. You skipped many steps in performing this analysis that are required for the conclusions to be valid. You should not estimate effects until you have validated that the matching was successful.
In this case, you need to decide whether AGE
makes sense as a confounder to adjust for. If I am wrong about it perfectly predicting group
, you need to figure out why you have perfect prediction of group
. It may be that there is fundamental imbalance in a key covariate that cannot be adjusted for using matching; you need to decide how to solve that.
Another issue is that it looks like you are matching using the entire dataset, when you should be matching only the dataset prior to receiving treatment (i.e., only the pre-period). Typically, you can accomplish this by matching in the pre-period and then merging the panel data into the matched dataset so that only the matched units from the panel remain.
Instead of doing all of this, though, I would recommend you use a method and package specifically designed for your scenario, like the did
package. It uses more sophisticated methods for adjusting for covariates but does so without requiring you to perform multiple steps, each of which might be error prone.
Upvotes: 0