BearCoder
BearCoder

Reputation: 17

Simulate with two time periods

I am trying to do a simulation in Stata with 1000 reps, 2 time periods and 400 observations. In the simulation I am trying to do it with fixed effects estimators. Can anyone help? I have written the following code:

clear all

global numid = 400


DEFINE PROGRAM THAT SPECIFIES THE DGP
program treatment, rclass
    drop _all
    
    *SET NUMBER OF INDIVIDUALS 
    set obs $numid
    *DATA GENERATING PROCESS
    generate alpha =  rnormal(0,2)
    generate u     = rnormal()
    gen id=_n
    expand 2
    bysort id: gen t=_n
    generate xstar= rnormal(2,4) 

    generate beta0 = 1
    generate beta1 = 2
    generate rho = -0,2
    generate xit = xstar + rho*alpha
    generate y = beta0+beta1*xit+alpha+u
    
    *CALCULATE OLS ESTIMATES AND SAVE RESULTS 
    regress y xit 
    return scalar b_OLS=_b[xit]
    return scalar se_OLS=_se[xit]
    regress y xit, robust
    return scalar ser_OLS=_se[xit]


end


*NOW RUN PROGRAM 1000 TIMES USING SIMULATE
simulate b_OLS=r(b_OLS) se_OLS=r(se_OLS)  ///
         b_FE=r(b_FE) se_FE=r(se_FE) , ///
         seed(117) reps(1000) :treatment

Upvotes: 0

Views: 31

Answers (1)

langtang
langtang

Reputation: 24832

There are a few things going on here:

  1. Rather than set a global here, I think you should use
args n
set obs `n'
  1. Rather than use drop_all, you should use preserve

  2. Your rho is mis-specfied.. Should be generate rho = -0.2

  3. You are not returning all the scalars that you need, and you are not naming them correctly when you call simulate. The former is corrected in the updated program, while the latter is corrected in the updated usage; both below.

  4. Your two regression models return the same estimates, only the standard error differs, as all you are doing between the two is calling the option robust.

  5. In your original post, you may need to be more explicit about what you are trying to model (i.e. in terms of the time periods). For example, are you trying to return the fixed effect of xit on y, separately by time period? If so, the linear model in the program needs to be estimated separately by time period (for example, using if t==1 / if t==2)

Here is an updated program:

program define treatment, rclass
    args n
    preserve
    *SET NUMBER OF INDIVIDUALS 
    set obs `n'
    *DATA GENERATING PROCESS
    generate alpha =  rnormal(0,2)
    generate u     = rnormal()
    gen id=_n
    expand 2
    bysort id: gen t=_n
    generate xstar= rnormal(2,4) 

    generate beta0 = 1
    generate beta1 = 2
    generate rho = -0.2
    generate xit = xstar + rho*alpha
    generate y = beta0+beta1*xit+alpha+u

    *CALCULATE OLS ESTIMATES AND SAVE RESULTS 
    regress y xit 
    return scalar b_OLS=_b[xit]
    return scalar se_OLS=_se[xit]
    regress y xit, robust
    return scalar br_OLS=_b[xit]
    return scalar ser_OLS=_se[xit]

end

Here is updated usage:

simulate b_OLS=r(b_OLS) se_OLS=r(se_OLS) br=r(br_OLS) ser=r(ser_OLS) , seed(117) reps(1000) :treatment 400

Here is output (first 10 of 1000)

. list in 1/10, clean

          b_OLS     se_OLS         br        ser  
  1.   1.951832   .0210895   1.951832   .0206124  
  2.   1.974374   .0201377   1.974374   .0190302  
  3.   1.953961   .0188414   1.953961   .0192269  
  4.    1.92406   .0192231    1.92406   .0201013  
  5.   1.941915   .0184931   1.941915   .0184521  
  6.   1.927281    .019787   1.927281   .0200865  
  7.   1.960419   .0186951   1.960419   .0185654  
  8.   2.003222   .0197742   2.003222    .019516  
  9.   1.946948   .0209217   1.946948   .0207528  
 10.   1.960583   .0195687   1.960583   .0204068  

Upvotes: 3

Related Questions