Reputation: 75
Thank you for taking a crack at my question! I am implementing a conditional logistic regression in Stata. I have choice data in long format where every choice consists of two available options and the decision-maker can only pick one. I have implemented it using the Stata clogit
command, which in my understanding creates fixed effects for every choice in the data and partials them out before regressing the dependent variable on remaining explanatory variables in the logistic regression. To convince myself that clogit
does what I think it does, I tried to reproduce the results I got using the logit
command and adding the fixed effects manually. FWIW that is not straightforward with large data sets given Stata's limit on the number of explanatory variables, but the problem persists in the following smaller MWE:
* Retrieve MWE data set
webuse lowbirth2, clear
* Add arbitrary cluster variable, because in my real problem the data is clustered
gen cluster = ceil(_n/14)
clogit low lwt smoke ptd ht ui i.race, group(pairid) cluster(cluster)
Conditional (fixed-effects) logistic regression
Number of obs = 112
Wald chi2(7) = 211.55
Prob > chi2 = 0.0000
Log pseudolikelihood = -25.794271 Pseudo R2 = 0.3355
(Std. Err. adjusted for 8 clusters in cluster)
------------------------------------------------------------------------------
| Robust
low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwt | -.0183757 .0111176 -1.65 0.098 -.0401657 .0034144
smoke | 1.400656 .4670183 3.00 0.003 .4853172 2.315995
ptd | 1.808009 .6162347 2.93 0.003 .600211 3.015807
ht | 2.361152 .9149873 2.58 0.010 .5678096 4.154494
ui | 1.401929 .5968851 2.35 0.019 .2320559 2.571802
|
race |
black | .5713643 .5699717 1.00 0.316 -.5457596 1.688488
other | -.0253148 .5197248 -0.05 0.961 -1.043957 .9933272
------------------------------------------------------------------------------
logit low lwt smoke ptd ht ui i.race i.pairid, cluster(cluster)
Logistic regression Number of obs = 112
Wald chi2(6) = .
Prob > chi2 = .
Log pseudolikelihood = -51.588542 Pseudo R2 = 0.3355
(Std. Err. adjusted for 8 clusters in cluster)
------------------------------------------------------------------------------
| Robust
low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwt | -.0367513 .0222351 -1.65 0.098 -.0803314 .0068288
smoke | 2.801312 .9340365 3.00 0.003 .9706343 4.63199
ptd | 3.616018 1.232469 2.93 0.003 1.200422 6.031613
ht | 4.722303 1.829975 2.58 0.010 1.135619 8.308988
ui | 2.803858 1.19377 2.35 0.019 .4641118 5.143605
|
race |
black | 1.142729 1.139943 1.00 0.316 -1.091519 3.376977
other | -.0506296 1.03945 -0.05 0.961 -2.087913 1.986654
|
pairid | ... omitted for brevity ...
Looking at these two outputs, we can see that the coefficients, standard errors, and Log Likelihood are not only different, but double exactly, as if the dependent variable had been scaled by a factor 2. I should add that when I don't cluster the standard errors, they don't exactly double anymore. So, clogit
doesn't seem to just partial out fixed effects on the backend, but what does it do? Neither the documentation nor the clogit.ado
file itself have resolved this for me.
Upvotes: 4
Views: 1654
Reputation: 1348
The difference in estimates that you're observing is the bias due to the incidental parameters problem from estimating logit
in a finite panel.
See Greene (2004) for a discussion of this bias, and note that the bias is (1) away from zero, and (2) as high as 100% when T=2.
In your case T is not time, but the number of individuals in the pair (i.e. 2), thus you should expect the bias to be in the neighbourhood of 100%.
If you reread the pdf documentation for clogit
, in particular the second paragraph of the Fixed-effects logit section under Remarks and examples you'll see how clogit
avoids this problem.
Since the bias from using logit
as you do is 100% relative to clogit
the relationship you document between estimated coefficients is the expected behaviour (or really misbehaviour in the logit
case) of the two estimators.
Upvotes: 3