Reputation: 1
I have micro data and I am running a regression of wages on industry dummies.
My regression output then includes a coefficient for each industry, which I want to save as a new variable named wd (wage differential).
The code below exemplifies what I want to do, but in reality I have hundreds of industries and almost 30 years.
How I could make a loop that does this efficiently?
reg lnwage i.industry if year == 2002
gen wd = 0
replace wd = 0 if industry==1 & year==2002
replace wd = _b[2.industry] if industry==2 & year==2002
replace wd = _b[3.industry] if industry==3 & year==2002
replace wd = _b[4.industry] if industry==4 & year==2002
replace wd = _b[5.industry] if industry==5 & year==2002
replace wd = _b[6.industry] if industry==6 & year==2002
replace wd = _b[7.industry] if industry==7 & year==2002
replace wd = _b[8.industry] if industry==8 & year==2002
replace wd = _b[9.industry] if industry==9 & year==2002
replace wd = _b[10.industry] if industry==10 & year==2002
replace wd = _b[11.industry] if industry==11 & year==2002
replace wd = _b[12.industry] if industry==12 & year==2002
replace wd = _b[13.industry] if industry==13 & year==2002
replace wd = _b[14.industry] if industry==14 & year==2002
replace wd = _b[15.industry] if industry==15 & year==2002
Upvotes: 0
Views: 733
Reputation: 37208
The most efficient loop is no loop at all.
Consider this example. predict
does almost all what you want directly for a regression with just one factor variable.
. sysuse auto, clear
(1978 Automobile Data)
. regress price i.rep78
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(4, 64) = 0.24
Model | 8360542.63 4 2090135.66 Prob > F = 0.9174
Residual | 568436416 64 8881819 R-squared = 0.0145
-------------+---------------------------------- Adj R-squared = -0.0471
Total | 576796959 68 8482308.22 Root MSE = 2980.2
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78 |
2 | 1403.125 2356.085 0.60 0.554 -3303.696 6109.946
3 | 1864.733 2176.458 0.86 0.395 -2483.242 6212.708
4 | 1507 2221.338 0.68 0.500 -2930.633 5944.633
5 | 1348.5 2290.927 0.59 0.558 -3228.153 5925.153
|
_cons | 4564.5 2107.347 2.17 0.034 354.5913 8774.409
------------------------------------------------------------------------------
. predict foo
(option xb assumed; fitted values)
(5 missing values generated)
. tabdisp rep78, c(foo)
-------------------------
Repair |
Record |
1978 | Fitted values
----------+--------------
1 | 4564.5
2 | 5967.625
3 | 6429.233
4 | 6071.5
5 | 5913
. |
-------------------------
There are various ways to proceed, one being to subtract the intercept directly and another being some variation on
. egen reference = mean(cond(rep78 == 1, foo, .))
. replace foo = foo - reference
(69 real changes made)
. tabdisp rep78, c(foo)
Looping over years would indeed mean looping over years, but see Statalist posts for many ways to do that directly.
Upvotes: 2