StephenB
StephenB

Reputation: 95

Marginal Effects of Factor variables

When using the marginal effects after logit in Stata why do i get different results depending on how I specify factor variables.

For example

    sysuse auto
    gen expensive=0
    replace expensive=1 if price>=4000
    qui logit expensive i.foreign 
    margins, dydx(foreign)

    qui logit expensive foreign
    margins, dydx(foreign)

I get that one of them is taking the marginal effect with respect to foreign, and the other with respect to 1.foreign. I'm just not clear on why this is happening, my prior would have been that these are the same thing.

Any help would be appreciated. Most importantly, which one is correct?

Upvotes: 1

Views: 454

Answers (1)

dimitriy
dimitriy

Reputation: 9470

Here's what Stata is doing under the hoods (pun intended):

sysuse auto, clear
gen expensive=0
replace expensive=1 if price>=4000
logit expensive i.foreign, coefl
predict phat, pr

/* Change in Pr(Expensive) for a tiny change in foreign */
margins, dydx(foreign) continuous // this is like your second spec
gen double me_foreign = phat*(1-phat)*_b[1.foreign]
sum me_foreign

/* Discrete change in Pr(Expensive) for when foreign goes from all 1 to all 0 */
margins, dydx(foreign)
replace foreign=1
predict phat1, pr
replace foreign=0
predict phat0, pr
gen double fd_foreign = phat1 - phat0
sum fd_foreign

When you omit the i. prefix, Stata calculates the change in probability of being expensive as is there was a tiny change in in foreign. You can mimic that by adding the continuous option to margins, dydx() instead of fitting a second model. Stata calculates the derivative of the predicted probability of being expensive with respect to foreign for every observation and then takes the average. This doesn't quite makes sense, since it doesn't correspond to a sensible manipulation. Foreign is binary, but the derivative gives you the change in probability for a small change in foreign, as if it was continuous. In linear models this difference does not matter, but in non-linear ones it can.

With the prefix i., Stata calculates the finite difference between the predicted probability as if every car was foreign minus the predicted probability as if every car was manufactured domestically, and then takes the average. This is arguably more sensible with a binary variable. On the other hand, the difference here (and in many empirical applications) is not that large, and you often see people do the former instead of the latter.

Upvotes: 1

Related Questions