Julian Ekberg
Julian Ekberg

Reputation: 1

Master thesis: Autocorrelation/lagrange test for master thesis in political science

I'm currently trying to detect how many lags I should include in my linear regression analysis in R.

The study is about whether the presence of commercial military actors (CMA) correlates/causes more military- and or civil deaths. My supervisor is very keen on me using lagrange multiplier test to test for how many lags I need. However, he is not a R user and can't help me implement. He also want me to include panel corrected standard errors (PCSE) proposed by Katz and Bailey.

Short variable description DV = log_military_cas; it is a log transformation of yearly military deaths on country basis IV = CMA; dummy coded variable suggesting either CMA presence in country and year combination (1) og no presence (0) lag-variable = lag_md; log_md lagged one year. DATA = lagr

This is what my supervisor sent me: Testing for serial correlation. This is what I wrote down in my notes as a grad student: Using the Lagrange Multiplier test first recommended by Engle (1984)(but also used by Beck and Katz (1996)) this is done in two steps: 1) estimate the model and save the residuals and 2)regress these residuals on the first lag of those and the independent variable. If the lag of the residual is statistically significant in the last regression, more lags of the dependent variable are needed. <-- So just do this but with a model without any lags of dependent variable. If you find serial correlation, include a lag of DV and test again.

Question is twofold 1) what I'm I doing wrong the attached code, and 2) Should the baseline reg include pcse?

# no lag
lagtest_0a <- lm(log_military_cas ~ CMA + as.factor(country) + as.factor(year), data = lagr)

# save risiduals
lagr$Risid_0 <- resid(lagtest_0)

lagtest_0b <- lm(log_military_cas  ~ CMA + Risid_0 + as.factor(country) + as.factor(year), data = lagr)
summary(lagtest_0b)

# Risid_0 is significant, so I need at least one  lag

# lag 1
lagtest_1a <- lm(log_military_cas ~ CMA + lag_md + as.factor(country) + as.factor(year), data = lagr)

# save new risiduals
lagr$Risid1 <- resid(lagtest_1a)

# here the follwoing errorcode arrives:
Error in `$<-.data.frame`(`*tmp*`, Risid1, value = c(`2` = 1.84005148256506,  : 
  replacement has 2855 rows, data has 2856

# Then I'm thinking, maybe I shouldnt store Risid_0 in the lagr dataframe. So I try without that just storing it for itself.

# save new risiduals in new way
Risid1 <- resid(lagtest_1a)

# rerun model
lagtest1 <- lm(log_military_cas  ~ CMA + Rs_lagtest_md1 + as.factor(country) + as.factor(year), data = lagr)

# Then, the following errorcode arrives:
Error in model.frame.default(formula = log_military_cas ~ CMA + Rs_lagtest_md1 +  : 
  variable lengths differ (found for 'Rs_lagtest_md1')

It seems like the problem is, that when I include lag_md (which has NA's on first year, since its lagged) the lenght of the variables are not the same, however as far as I know, the default system in R omits NA's. I even tried to specify this with na.action = na.omit, but the same error arrives.

Upvotes: 0

Views: 62

Answers (0)

Related Questions