chan1142
chan1142

Reputation: 643

Stata - how to create T variables that have values for each t in panel data

Sorry if the title of my question is unclear, but it's hard to summarize it on one line. I have a panel data set (codes to generate it are at the bottom):

. xtset id year
       panel variable:  id (strongly balanced)
        time variable:  year, 1 to 3
                delta:  1 unit

. l, sep(3)

     +-----------------+
     | id   year     x |
     |-----------------|
  1. |  1      1   1.1 |
  2. |  1      2   1.2 |
  3. |  1      3   1.3 |
     |-----------------|
  4. |  2      1   2.1 |
  5. |  2      2   2.2 |
  6. |  2      3   2.3 |
     +-----------------+

I want to create variables x_1, x_2 and x_3, where x_j has the value of x in year j for each id. I can achieve it as follows (with no elegance pursued):

. forv k=1/3 {
  2. capture drop tmp
  3. gen tmp = x if year==`k'
  4. by id: egen x_`k' = mean(tmp)
  5. }
(4 missing values generated)
(4 missing values generated)
(4 missing values generated)

. drop tmp

. l, sep(3)

     +-----------------------------------+
     | id   year     x   x_1   x_2   x_3 |
     |-----------------------------------|
  1. |  1      1   1.1   1.1   1.2   1.3 |
  2. |  1      2   1.2   1.1   1.2   1.3 |
  3. |  1      3   1.3   1.1   1.2   1.3 |
     |-----------------------------------|
  4. |  2      1   2.1   2.1   2.2   2.3 |
  5. |  2      2   2.2   2.1   2.2   2.3 |
  6. |  2      3   2.3   2.1   2.2   2.3 |
     +-----------------------------------+

Is there a way without using a loop? I know I can write a program or an ado file (determining the variable names automatically), but I wonder if there are some builtin commands for my purpose.

The full commands are here.

clear all
set obs 6
gen id = floor((_n-1)/3)+1
by id, sort: gen year = _n
xtset id year
gen x = id+year/10
l, sep(3)
forv k=1/3 {
    capture drop tmp
    gen tmp = x if year==`k'
    by id: egen x_`k' = mean(tmp)
}
drop tmp
l, sep(3)

Upvotes: 2

Views: 670

Answers (3)

ander2ed
ander2ed

Reputation: 1338

A late entry, but you could avoid loops if you wanted by using reshape and merge:

clear *
input float(id year x)
1 1 1.1
1 2 1.2
1 3 1.3
2 1 2.1
2 2 2.2
2 3 2.3
end

tempfile master
save `master'

reshape wide x, i(id) j(year)

tempfile using
save `using'

use `master', clear
merge m:1 id using `using', nogen

Upvotes: 2

chan1142
chan1142

Reputation: 643

This "answer", which I post because it is too long as a comment, contains results from practices following Nick Cox's answer. All credits go to him.

Method 1: Use egen and total, missing.

levelsof year, local(yearlevels)
foreach v of varlist x {
    foreach year of local yearlevels {
        by id: egen `v'_`year' = total(`v' / (year==`year')), missing
    }
}

The missing option handles unbalanced panels.

Method 2: Use separate and then copy the values.

foreach v of varlist x {
    separate `v', by(year) gen(`v'_)
    local newvars = r(varlist)
    foreach w of local newvars {
        by id: egen f_`w' = total(`w'), missing
    }
    drop `newvars'
}

This also handles unbalanced panels, but the new variable names are f_x_1, etc. The first method needs the levels of year, while the second needs creating a set of intermediate variables. I personally slightly prefer the first. It would be wonderful if Method 2 can be shortened.

Upvotes: 1

Nick Cox
Nick Cox

Reputation: 37208

Loops are good. What I can do for you is shorten your loop:

clear all
set obs 6
gen id = floor((_n-1)/3)+1
by id, sort: gen year = _n
xtset id year
gen x = id+year/10
l, sep(3)

forv k=1/3 {
    by id: gen x_`k' = x[`k'] 
}

l, sep(3)

There is a decency assumption in there of a balanced panel. This loop makes no such assumption, but you need to loop over the observed years:

forv year = 1/3 {
    by id: egen X_`year' = total(x / (year == `year'))  
}

See also this discussion, especially Sections 9 and 10.

You may also be interested in separate, which avoids an explicit loop, but only gets you part of the way to where you want to be.

All that said, it's hard to believe that you need these variables at all. The mechanism of time series operators solves many problems, while tools such as rangestat (SSC) fill in many gaps.

Upvotes: 2

Related Questions