Reputation: 13
I have a dataset in wide format that has quarterly counts of an event from Q1
1996
to Q4
2016
.
The variable names for each quarter are as follows:
yyyy0101_yyyy0401
yyyy0401_yyyy0701
yyyy0701_yyyy1001
yyyy1001_yyyy0101
I have a macro that converts them like this:
local i = 1996
forvalues x = 1996/2016 {
local i = `i'+1
gen count`x' = event_`x'0101_`x'0401 + event_`x'0401_`x'0701 +
event_`x'0701_`x'1001 + event_`x'1001_`i'0101
}
Then I collapse my data into a single variable count
in long format by year:
reshape long count, i(xvars) j(year)
Now I would like to do the same thing, but quarterly.
What is the macro to perform the exact same process, but capturing the sum of counts by year-quarter? What if I wanted to do it for half years?
Upvotes: 0
Views: 67
Reputation: 37278
In trying to understand this I note first (pedantically, or otherwise) that your code is not a macro in Stata's sense: SAS user???) and second (more constructively) that your code can be condensed to
forvalues x = 1996/2016 {
gen count`x' = event_`x'0101_`x'0401 + event_`x'0401_`x'0701 + event_`x'0701_`x'1001 + event_`x'1001_`x'0101
}
given that local macros i
and x
run over the same values. (Those are macros in Stata's sense.) But I would not start there.
As you say, the data are wide format (I prefer the term layout as less overloaded), so the main job is not to write loops at all, but to reshape
to long. This example embodies a little guesswork and shows some technique. Absent a data example to work with, I first create a sandbox:
clear
set obs 21
local y = 1
foreach v in yyyy0101_yyyy0401 yyyy0401_yyyy0701 yyyy0701_yyyy1001 yyyy1001_yyyy0101 {
gen `v' = `++y'
}
gen year = 1995 + _n
rename (yyyy*) (count#), addnumber
reshape long count, i(year) j(quarter)
gen qdate = yq(y, q)
egen ycount = total(count), by(year)
egen qcount = total(count), by(quarter)
gen half = cond(inlist(quarter, 1, 2), 1, 2)
egen hcount = total(count), by(year half)
list if year < 1998, sepby(year)
+------------------------------------------------------------------+
| year quarter count qdate ycount qcount half hcount |
|------------------------------------------------------------------|
1. | 1996 1 2 144 14 42 1 5 |
2. | 1996 2 3 145 14 63 1 5 |
3. | 1996 3 4 146 14 84 2 9 |
4. | 1996 4 5 147 14 105 2 9 |
|------------------------------------------------------------------|
5. | 1997 1 2 148 14 42 1 5 |
6. | 1997 2 3 149 14 63 1 5 |
7. | 1997 3 4 150 14 84 2 9 |
8. | 1997 4 5 151 14 105 2 9 |
+------------------------------------------------------------------+
How does this differ from what you did, as you already used reshape long
? Principally to underline that whatever totals you want on different time scales can be generated in place. You don't need repeated collapse
or different versions of the same dataset. How to tabulate, list or otherwise deal with the repetitions would be different questions.
Upvotes: 0