Reputation: 643
Sorry if the title of my question is unclear, but it's hard to summarize it on one line. I have a panel data set (codes to generate it are at the bottom):
. xtset id year
panel variable: id (strongly balanced)
time variable: year, 1 to 3
delta: 1 unit
. l, sep(3)
+-----------------+
| id year x |
|-----------------|
1. | 1 1 1.1 |
2. | 1 2 1.2 |
3. | 1 3 1.3 |
|-----------------|
4. | 2 1 2.1 |
5. | 2 2 2.2 |
6. | 2 3 2.3 |
+-----------------+
I want to create variables x_1
, x_2
and x_3
, where x_j
has the value of x
in year
j
for each id
. I can achieve it as follows (with no elegance pursued):
. forv k=1/3 {
2. capture drop tmp
3. gen tmp = x if year==`k'
4. by id: egen x_`k' = mean(tmp)
5. }
(4 missing values generated)
(4 missing values generated)
(4 missing values generated)
. drop tmp
. l, sep(3)
+-----------------------------------+
| id year x x_1 x_2 x_3 |
|-----------------------------------|
1. | 1 1 1.1 1.1 1.2 1.3 |
2. | 1 2 1.2 1.1 1.2 1.3 |
3. | 1 3 1.3 1.1 1.2 1.3 |
|-----------------------------------|
4. | 2 1 2.1 2.1 2.2 2.3 |
5. | 2 2 2.2 2.1 2.2 2.3 |
6. | 2 3 2.3 2.1 2.2 2.3 |
+-----------------------------------+
Is there a way without using a loop? I know I can write a program
or an ado
file (determining the variable names automatically), but I wonder if there are some builtin commands for my purpose.
The full commands are here.
clear all
set obs 6
gen id = floor((_n-1)/3)+1
by id, sort: gen year = _n
xtset id year
gen x = id+year/10
l, sep(3)
forv k=1/3 {
capture drop tmp
gen tmp = x if year==`k'
by id: egen x_`k' = mean(tmp)
}
drop tmp
l, sep(3)
Upvotes: 2
Views: 670
Reputation: 1338
A late entry, but you could avoid loops if you wanted by using reshape
and merge
:
clear *
input float(id year x)
1 1 1.1
1 2 1.2
1 3 1.3
2 1 2.1
2 2 2.2
2 3 2.3
end
tempfile master
save `master'
reshape wide x, i(id) j(year)
tempfile using
save `using'
use `master', clear
merge m:1 id using `using', nogen
Upvotes: 2
Reputation: 643
This "answer", which I post because it is too long as a comment, contains results from practices following Nick Cox's answer. All credits go to him.
Method 1: Use egen
and total, missing
.
levelsof year, local(yearlevels)
foreach v of varlist x {
foreach year of local yearlevels {
by id: egen `v'_`year' = total(`v' / (year==`year')), missing
}
}
The missing
option handles unbalanced panels.
Method 2: Use separate
and then copy the values.
foreach v of varlist x {
separate `v', by(year) gen(`v'_)
local newvars = r(varlist)
foreach w of local newvars {
by id: egen f_`w' = total(`w'), missing
}
drop `newvars'
}
This also handles unbalanced panels, but the new variable names are f_x_1
, etc. The first method needs the levels of year
, while the second needs creating a set of intermediate variables. I personally slightly prefer the first. It would be wonderful if Method 2 can be shortened.
Upvotes: 1
Reputation: 37208
Loops are good. What I can do for you is shorten your loop:
clear all
set obs 6
gen id = floor((_n-1)/3)+1
by id, sort: gen year = _n
xtset id year
gen x = id+year/10
l, sep(3)
forv k=1/3 {
by id: gen x_`k' = x[`k']
}
l, sep(3)
There is a decency assumption in there of a balanced panel. This loop makes no such assumption, but you need to loop over the observed years:
forv year = 1/3 {
by id: egen X_`year' = total(x / (year == `year'))
}
See also this discussion, especially Sections 9 and 10.
You may also be interested in separate
, which avoids an explicit loop, but only gets you part of the way to where you want to be.
All that said, it's hard to believe that you need these variables at all. The mechanism of time series operators solves many problems, while tools such as rangestat
(SSC) fill in many gaps.
Upvotes: 2