Lucy
Lucy

Reputation: 31

Linear Regression Model with Newey West Standard Errors and Interaction Terms in R isn't calculating coefficients for all my variables

I'm working of a regression of how income is impacted by the occurrence of natural disasters. After running AC and Heteroskedasticity tests I need to apply NW standard errors to my model.

My model output using stargaze skips the interaction terms associated with the Flood dummy (see regression output at end)

I attempted the model with only the Flood dummy interaction terms and these were all significant. Also when not interacted both terms are significant.

Any help on how to get the coefficients on these would be highly appreciated!

Here is my code:

Fit the model including all lagged disaster dummy variables, the intensity score, and duration dummies

model_pooled_ols_5e <- lm(log_Average_Wage_and_Salary_Income ~ Lagged1y_Disaster_Dummy_Storm + Lagged1y_Disaster_Dummy_Tropical_Cyclone_Storm + Lagged1y_Disaster_Dummy_Wildfire + Lagged1y_Disaster_Dummy_Flood + Intensity_Score + Duration_Under_1_Week_Dummy + Duration_1_to_3_Weeks_Dummy + Duration_1_Month_Dummy + Duration_Over_1_Month_Dummy + Inner_Regional_Dummy + Major_Cities_Dummy + Outer_Regional_Dummy + Remote_Very_Remote_Dummy + Interaction_Storm_Inner_Regional + Interaction_Storm_Major_Cities + Interaction_Storm_Outer_Regional + Interaction_Storm_Remote_Very_Remote + Interaction_Tropical_Cyclone_Inner_Regional + Interaction_Tropical_Cyclone_Major_Cities + Interaction_Tropical_Cyclone_Outer_Regional + Interaction_Tropical_Cyclone_Remote_Very_Remote + Interaction_Wildfire_Inner_Regional + Interaction_Wildfire_Major_Cities + Interaction_Wildfire_Outer_Regional + Interaction_Wildfire_Remote_Very_Remote + Interaction_Flood_Inner_Regional + Interaction_Flood_Major_Cities + Interaction_Flood_Outer_Regional + Interaction_Flood_Remote_Very_Remote + Year, data=Merged_ABS_EMDAT_simple_duplicates_cleaned_v5)

================================================================================== Dependent variable: ---------------------------------- log_Average_Wage_and_Salary_Income

Lagged1y_Disaster_Dummy_Storm -0.038
(0.036)

Lagged1y_Disaster_Dummy_Tropical_Cyclone_Storm 0.162***
(0.032)

Lagged1y_Disaster_Dummy_Wildfire 0.017
(0.021)

Lagged1y_Disaster_Dummy_Flood -0.099***
(0.024)

Intensity_Score -0.035
(0.065)

Duration_Under_1_Week_Dummy -0.116***
(0.025)

Duration_1_to_3_Weeks_Dummy -0.063***
(0.023)

Duration_1_Month_Dummy -0.086***
(0.020)

Duration_Over_1_Month_Dummy

Inner_Regional_Dummy 0.019
(0.045)

Major_Cities_Dummy 0.194***
(0.044)

Outer_Regional_Dummy 0.008
(0.044)

Remote_Very_Remote_Dummy

Interaction_Storm_Inner_Regional -0.038
(0.024)

Interaction_Storm_Major_Cities -0.079***
(0.021)

Interaction_Storm_Outer_Regional 0.076**
(0.037)

Interaction_Storm_Remote_Very_Remote -0.158***
(0.048)

Interaction_Tropical_Cyclone_Inner_Regional 0.180***
(0.042)

Interaction_Tropical_Cyclone_Major_Cities 0.023
(0.025)

Interaction_Tropical_Cyclone_Outer_Regional 0.146***
(0.026)

Interaction_Tropical_Cyclone_Remote_Very_Remote 0.319***
(0.069)

Interaction_Wildfire_Inner_Regional 0.054**
(0.026)

Interaction_Wildfire_Major_Cities -0.036
(0.023)

Interaction_Wildfire_Outer_Regional 0.109***
(0.035)

Interaction_Wildfire_Remote_Very_Remote -0.208***
(0.055)

Interaction_Flood_Inner_Regional

Interaction_Flood_Major_Cities

Interaction_Flood_Outer_Regional

Interaction_Flood_Remote_Very_Remote

Year 0.030***
(0.002)

Constant -49.791***
(3.301)


Note: Newey-West standard errors applied Observations 1,967 R2 0.533 Adjusted R2 0.527 Residual Std. Error 0.169 (df = 1942) F Statistic 92.337*** (df = 24; 1942)

Note: *p<0.1; **p<0.05; ***p<0.01

Subset the correlation matrix to include only the interaction terms and their correlation with all other variables

interaction_terms <- c("Interaction_Flood_Inner_Regional", "Interaction_Flood_Major_Cities", "Interaction_Flood_Outer_Regional", "Interaction_Flood_Remote_Very_Remote")

Extract the relevant part of the correlation matrix

correlation_matrix_interactions <- correlation_matrix_all[interaction_terms, ] print(correlation_matrix_interactions)

                                 log_Average_Wage_and_Salary_Income Lagged1y_Disaster_Dummy_Storm

Interaction_Flood_Inner_Regional -0.18158278 -0.03057133 Interaction_Flood_Major_Cities 0.09542833 -0.05145906 Interaction_Flood_Outer_Regional -0.18727074 -0.02185281 Interaction_Flood_Remote_Very_Remote -0.08479981 -0.01270038

                                 Lagged1y_Disaster_Dummy_Tropical_Cyclone_Storm

Interaction_Flood_Inner_Regional -0.08403034 Interaction_Flood_Major_Cities -0.14144371 Interaction_Flood_Outer_Regional -0.10310793 Interaction_Flood_Remote_Very_Remote -0.03490910

                                 Lagged1y_Disaster_Dummy_Wildfire Lagged1y_Disaster_Dummy_Flood

Interaction_Flood_Inner_Regional -0.09474129 0.04440241 Interaction_Flood_Major_Cities -0.11059196 -0.08919554 Interaction_Flood_Outer_Regional -0.12563254 0.30334249 Interaction_Flood_Remote_Very_Remote -0.04422148 0.09880484

                                 Intensity_Score Duration_Under_1_Week_Dummy

Interaction_Flood_Inner_Regional -0.13298999 0.015142296 Interaction_Flood_Major_Cities -0.36021593 -0.011496344 Interaction_Flood_Outer_Regional -0.02933677 -0.055770679 Interaction_Flood_Remote_Very_Remote -0.03530315 0.006790861

                                 Duration_1_to_3_Weeks_Dummy Duration_1_Month_Dummy

Interaction_Flood_Inner_Regional 0.02727324 0.008551952 Interaction_Flood_Major_Cities 0.20987979 -0.168472347 Interaction_Flood_Outer_Regional 0.03265276 -0.035494128 Interaction_Flood_Remote_Very_Remote -0.05184300 0.014811205

                                 Duration_Over_1_Month_Dummy Inner_Regional_Dummy

Interaction_Flood_Inner_Regional -0.07894394 0.70226950 Interaction_Flood_Major_Cities -0.15504594 -0.22967180 Interaction_Flood_Outer_Regional 0.06066184 -0.16742337 Interaction_Flood_Remote_Very_Remote 0.06464149 -0.05668429

                                 Major_Cities_Dummy Outer_Regional_Dummy Remote_Very_Remote_Dummy

Interaction_Flood_Inner_Regional -0.3672568 -0.16640793 -0.05299021 Interaction_Flood_Major_Cities 0.4391790 -0.28010542 -0.08919554 Interaction_Flood_Outer_Regional -0.4506359 0.70655484 -0.06502069 Interaction_Flood_Remote_Very_Remote -0.1525711 -0.06913159 0.75122639

                                 Interaction_Storm_Inner_Regional Interaction_Storm_Major_Cities

Interaction_Flood_Inner_Regional -0.04402616 -0.12742112 Interaction_Flood_Major_Cities -0.07410684 -0.21448105 Interaction_Flood_Outer_Regional -0.05402151 -

Upvotes: 3

Views: 92

Answers (1)

Robert Long
Robert Long

Reputation: 6887

Without a minimal working example it is very difficult to give an answer, but I suspect that what is happening is that you are referring to categorical variables and for these one of the level of those variables is absent. If that is the case, then this is normal/expected behaviour - the "missing" estimates form part of the intercept. Check out the package emmeans

Here is an example of this in action:

library(ggplot2)
library(emmeans)

set.seed(123)

# Number of observations per group
n <- 30

# Create the 'colour' factor variable
colour <- factor(rep(c("Red", "Green", "Blue"), each = n))

# Simulate the response variable 'value' with different means for each group
value <- c(rnorm(n, mean = 5, sd = 1),   # Red
           rnorm(n, mean = 6, sd = 1),   # Green
           rnorm(n, mean = 7, sd = 1))   # Blue

data <- data.frame(colour, value)

So, we have created a dataset with one independent variable "colour", having 3 levels, Red, Green and Blue. First, we plot the data to visualize the differences (it's always a good idea to visualise your data)

ggplot(data, aes(x = colour, y = value, fill = colour)) +
  geom_boxplot() +
  labs(title = "Boxplot of Simulated Data by Colour",
       x = "Colour",
       y = "Value") +
  theme_minimal()

enter image description here

And now we fit the model:

m0 <- lm(value ~ colour, data = data)
summary(m0)

which produces:

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   7.0244     0.1639  42.869  < 2e-16 ***
colourGreen  -0.8461     0.2317  -3.651 0.000445 ***
colourRed    -2.0715     0.2317  -8.939 5.98e-14 ***

...and we see that the estimate for Blue is missing.

Now we estimate marginal means:

em_means <- emmeans(m0, ~ colour)
summary(em_means)

which gives us:

 colour emmean    SE df lower.CL upper.CL
 Blue     7.02 0.164 87     6.70     7.35
 Green    6.18 0.164 87     5.85     6.50
 Red      4.95 0.164 87     4.63     5.28

which recovers the "missing" estimate for Blue

Upvotes: 0

Related Questions