Why does ppmlhdfe give different results for manually interacted variables?

I am running a regression using ppmlhdfe with two dummy variables and the interaction between them. This is constructed as follows:

gen interaction = D1*D2 
ppmlhdfe y D1#D2 control i.year, vce(robust)
ppmlhdfe y D1 interaction D2 control i.year, vce(robust)

I ran this comparison mostly to see if the results are the same, as the way esttab outputs and labels the first version is kind of ugly and confusing. However, while the coefficients on D1 and D2 in the second version match those of D1 = 1, D2 = 0 and D1 = 0, D2 = 1 in the first version, the interaction term is completely different - wrong sign, wrong magnitude, significant in the first version but insignificant in the second. The coefficient on D1 = 0, D2 = 0 which is explicitly outputted in the first version is omitted due to collinearity, so I feel the results really should be identical. I have re-run this using the reg command to make sure it's not a ppml issue, but the same thing happened. I have also tried adding the dummies and interaction as explicit factor variables:

ppmlhdfe y i.D1 i.interaction i.D2 control i.year, vce(robust)

but the outcome did not change.

I have found this response to a similar question but using a manual and a continuous interaction term, which is a bit different from my case. I tried to apply it anyway, generating both levels of the first dummy variable and interacting both with the other dummy, as follows:

tab D1, gen(d)
gen d1D2 = d1*D2
gen d2D2 = d2*D2
ppmlhdfe y D1 D2 d1D2 d2D2 control i.year, vce(robust)

but what happens is that d2D2 is omitted because of collinearity - not surprisingly - and the results are the same. Does anyone have any clues as to why this is?

EDIT1: minimum workable example -

sysuse auto.dta, clear
gen high_price = 0
replace high_price = 1 if price>6165
gen interaction = high_price*foreign 
ppmlhdfe trunk high_price interaction foreign headroom, vce(robust)
ppmlhdfe trunk high_price#foreign headroom, vce(robust)

EDIT2: please note I have also responded to the Statalist post referenced above, as I realised after posting here that there might be more Stata-specific help available there.

Upvotes: 1

Answers (3)

William Jergins

Reputation: 21

The single pound interaction (i.e., var1#var2) includes only the interaction term, not the interaction and both base levels. In order to get the interaction and both base levels, you need the double pound (i.e., var1##var2).

As such, you are not estimating the same model with

ppmlhdfe y D1#D2 control i.year, vce(robust)

and

ppmlhdfe y D1 interaction D2 control i.year, vce(robust)

-- which is why you get different results.

Upvotes: 2

Poonacha

Reputation: 1316

Here, when you run i.D1#i.D2, the base is set at 0.D1#0.D2 for each combination of - 1.D1#0.D2, 0.D1#1.D2, and 1.D1#1.D2. So the effect you observe for 1.D1#1.D2 when you run i.D1#i.D2 will actually be equal to the sum of the effects - D1 + D2 + D1#D2.

Upvotes: 0

Pia

Reputation: 87

Following advice received on Statalist by Jeff Wooldridge and Carlo Lazzaro I tried

ppmlhdfe trunk high_price##foreign headroom, vce(robust)

and

ppmlhdfe trunk high_price c.high_price#c.foreign foreign headroom, vce(robust)

both of which yielded the same result as the manual interaction term. Still not sure what's up with the single # but I conclude it is better to use ##.

Upvotes: 0

Why does ppmlhdfe give different results for manually interacted variables?

Answers (3)

Related Questions