germaneconomist
germaneconomist

Reputation: 1

Stata: Generate a loop to save regression output as new variable

I have micro data and I am running a regression of wages on industry dummies.

My regression output then includes a coefficient for each industry, which I want to save as a new variable named wd (wage differential).

The code below exemplifies what I want to do, but in reality I have hundreds of industries and almost 30 years.

How I could make a loop that does this efficiently?

reg lnwage i.industry if year == 2002

gen wd = 0

replace wd = 0 if industry==1 & year==2002
replace wd = _b[2.industry] if industry==2 & year==2002
replace wd = _b[3.industry] if industry==3 & year==2002
replace wd = _b[4.industry] if industry==4 & year==2002
replace wd = _b[5.industry] if industry==5 & year==2002
replace wd = _b[6.industry] if industry==6 & year==2002
replace wd = _b[7.industry] if industry==7 & year==2002
replace wd = _b[8.industry] if industry==8 & year==2002
replace wd = _b[9.industry] if industry==9 & year==2002
replace wd = _b[10.industry] if industry==10 & year==2002
replace wd = _b[11.industry] if industry==11 & year==2002
replace wd = _b[12.industry] if industry==12 & year==2002
replace wd = _b[13.industry] if industry==13 & year==2002
replace wd = _b[14.industry] if industry==14 & year==2002
replace wd = _b[15.industry] if industry==15 & year==2002

Upvotes: 0

Views: 733

Answers (1)

Nick Cox
Nick Cox

Reputation: 37208

The most efficient loop is no loop at all.

Consider this example. predict does almost all what you want directly for a regression with just one factor variable.

. sysuse auto, clear
(1978 Automobile Data)

. regress price i.rep78

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =      0.24
       Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
    Residual |   568436416        64     8881819   R-squared       =    0.0145
-------------+----------------------------------   Adj R-squared   =   -0.0471
       Total |   576796959        68  8482308.22   Root MSE        =    2980.2

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
          3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
          4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
          5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
             |
       _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
------------------------------------------------------------------------------

. predict foo
(option xb assumed; fitted values)
(5 missing values generated)

. tabdisp rep78, c(foo)

-------------------------
Repair    |
Record    |
1978      | Fitted values
----------+--------------
        1 |        4564.5
        2 |      5967.625
        3 |      6429.233
        4 |        6071.5
        5 |          5913
        . |              
-------------------------

There are various ways to proceed, one being to subtract the intercept directly and another being some variation on

. egen reference = mean(cond(rep78 == 1, foo, .))

. replace foo  = foo  - reference
(69 real changes made)

. tabdisp rep78, c(foo)

Looping over years would indeed mean looping over years, but see Statalist posts for many ways to do that directly.

Upvotes: 2

Related Questions