dwaynebeckham27
dwaynebeckham27

Reputation: 1

Issues with PSM-DiD model and its related density plot

As a part of my project, I am trying to implement a PSM-DiD model, using the MatchIt package in R. I am very new to this and, therefore, unable to figure out where exactly I am going wrong with the code, as I am getting wrong results for the propensity scores, after running the initial Logit model, and subsequently, the matching model.

A brief description of my data:

Variables: Under14 (treated term; 1 for age < 14, 0 otherwise), Post1986 (post-term; 1 for year <= 1986, 0 otherwise), LIT (outcome), additional covariates - AGE, SEX, FAMSIZE, NCHILD, URBAN, YEAR, STATE (last 2 variables are not very relevant).

From what I understand, propensity scores are conditional probabilities, so this model should have given the output (propensity variable) correctly. However, the propensity scores are nearly 0 for the treated units (2.22e-16) and 1 for the control units, which doesn't make sense (to me).

This is the code that I have used for calculating propensity scores:

# Step 1: Assign groups (Treated or Control) based on the 'Under14' variable
temp$group <- with(temp, ifelse(Under14 == 1, "Treated", "Control"))
temp$group <- factor(temp$group, levels = c("Treated", "Control"))

# Step 2: Fit a logistic regression model to estimate propensity scores
logit_model <- glm(group ~ URBAN + AGE + SEX + NCHILD + FAMSIZE, family = binomial, data = temp)
summary(logit_model)

# Step 3: Calculate predicted probabilities (propensity scores)
temp$propensity <- predict(logit_model, type = "response")

And did this for the matching model:

# Step 4: Perform matching using propensity scores with Nearest neighbor matching
psm <- matchit(group ~ URBAN + AGE + SEX + NCHILD + FAMSIZE, method = "nearest", distance = temp$propensity, data = temp)

# Step 5: Extract matched data
matched_data <- match.data(psm)

# Step 6: Fit the DiD model on matched data
# Group: Treated or Control
# Post1986: Post-treatment indicator
# Interaction term: group * Post1986
did_model <- lm(LIT ~ group * Post1986, data = matched_data, weights = weights)
summary(did_model)

This is the DiD output (the coefficient says groupControl, for some reason I don't understand, that is, the treated one is missing):

Output

output

It gives the weights as 1 for all units; hence, the final DiD model does not give the desired output. The 0 and 1 values are also present in the final dataset, matched data. I am confused about what could be wrong with my approach. If anyone could point out the mistakes, it would be very helpful.

Additionally, I also want to make the kernel density plots for the before and after matching of propensity scores, something like this:

Density plot

density plot

Any help with this would also be greatly appreciated. I understand my question is very long but I am completely stuck. Thank you!

Upvotes: 0

Views: 50

Answers (1)

Noah
Noah

Reputation: 4424

Assuming Under14 is related to AGE, this is because AGE perfectly predicts group. When you have perfect prediction, you will have propensity scores of 0 or 1. You might have seen that if you examined the propensity score model and balance on the covariates and propensity score prior to matching as instructed in the MatchIt vignettes. You skipped many steps in performing this analysis that are required for the conclusions to be valid. You should not estimate effects until you have validated that the matching was successful.

In this case, you need to decide whether AGE makes sense as a confounder to adjust for. If I am wrong about it perfectly predicting group, you need to figure out why you have perfect prediction of group. It may be that there is fundamental imbalance in a key covariate that cannot be adjusted for using matching; you need to decide how to solve that.

Another issue is that it looks like you are matching using the entire dataset, when you should be matching only the dataset prior to receiving treatment (i.e., only the pre-period). Typically, you can accomplish this by matching in the pre-period and then merging the panel data into the matched dataset so that only the matched units from the panel remain.

Instead of doing all of this, though, I would recommend you use a method and package specifically designed for your scenario, like the did package. It uses more sophisticated methods for adjusting for covariates but does so without requiring you to perform multiple steps, each of which might be error prone.

Upvotes: 0

Related Questions