Reputation: 75
I want to calculate growth rates in Stata for observations having the same ID. My data looks like this in a simplified way:
ID year a b c d e f
10 2010 2 4 9 8 4 2
10 2011 3 5 4 6 5 4
220 2010 1 6 11 14 2 5
220 2011 6 2 12 10 5 4
334 2010 4 5 4 6 1 4
334 2011 5 5 4 4 3 2
Now I want to calculate for each ID growth rates from variables a-f from 2010 to 2011:
For e.g ID 10 and variable a it would be: (3-2)/2, for variable b: (5-4)/4 etc. and store the results in new variables (e.g. growth_a, growth_b etc).
Since I have over 120k observations and around 300 variables, is there an efficient way to do so (loop)?
My code looks like the following (simplified):
local variables "a b c d e f"
foreach x in local variables {
bys ID: g `x'_gr = (`x'[_n]-`x'[_n-1])/`x'[_n-1]
}
FYI: variables a-f are numeric.
But Stata says: 'local not found' and I am not sure whether the code is correct. Do I also have to sort for year first?
Upvotes: 1
Views: 6433
Reputation: 37278
The specific error in
local variables "a b c d e f"
foreach x in local variables {
bys ID: g `x'_gr = (`x'[_n]-`x'[_n-1])/`x'[_n-1]
}
is an error in the syntax of foreach
, which here expects syntax like foreach x of local variables
, given your prior use of a local macro. With the keyword in
, foreach
takes the word local
literally and here looks for a variable with that name: hence the error message. This is basic foreach
syntax: see its help.
This code is problematic for further reasons.
Sorting on ID
does not guarantee the correct sort order, here time order by year
, for each distinct ID
. If observations are jumbled within ID
, results will be garbage.
The code assumes that all time values are present; otherwise the time gap between observations might be unequal.
A cleaner way to get growth rates is
tsset ID year
foreach x in a b c d e f {
gen `x'_gr = D.`x'/L.`x'
}
Once you have tsset
(or xtset
) the time series operators can be used without fear: correct sorting is automatic and the operators are smart about gaps in the data (e.g. jumps from 1982 to 1984 in yearly data).
For more variables the loop could be
foreach x of var <whatever> {
gen `x'_gr = D.`x'/L.`x'
}
where <whatever>
could be a general (numeric) varlist.
EDIT: The question has changed since first posting and interest is declared in calculating growth rates only from 2010 to 2011, with the implication in the example that only those years are present. The more general code above will naturally still work for calculating those growth rates.
Upvotes: 4