EconoQ
EconoQ

Reputation: 13

Year to month date conversion

I have a dataset in wide format that has quarterly counts of an event from Q1 1996 to Q4 2016.

The variable names for each quarter are as follows:

I have a macro that converts them like this:

local i = 1996

forvalues x = 1996/2016 { 
    local i = `i'+1
    gen count`x' = event_`x'0101_`x'0401 + event_`x'0401_`x'0701 + 
    event_`x'0701_`x'1001 + event_`x'1001_`i'0101
}

Then I collapse my data into a single variable count in long format by year:

reshape long count, i(xvars) j(year)

Now I would like to do the same thing, but quarterly.

What is the macro to perform the exact same process, but capturing the sum of counts by year-quarter? What if I wanted to do it for half years?

Upvotes: 0

Views: 67

Answers (1)

Nick Cox
Nick Cox

Reputation: 37278

In trying to understand this I note first (pedantically, or otherwise) that your code is not a macro in Stata's sense: SAS user???) and second (more constructively) that your code can be condensed to

forvalues x = 1996/2016 { 
    gen count`x' = event_`x'0101_`x'0401 + event_`x'0401_`x'0701 + event_`x'0701_`x'1001 + event_`x'1001_`x'0101
}

given that local macros i and x run over the same values. (Those are macros in Stata's sense.) But I would not start there.

As you say, the data are wide format (I prefer the term layout as less overloaded), so the main job is not to write loops at all, but to reshape to long. This example embodies a little guesswork and shows some technique. Absent a data example to work with, I first create a sandbox:

clear 
set obs 21
local y = 1 
foreach v in yyyy0101_yyyy0401 yyyy0401_yyyy0701 yyyy0701_yyyy1001 yyyy1001_yyyy0101 { 
    gen `v' = `++y' 
}
gen year = 1995 + _n 

rename (yyyy*) (count#), addnumber 

reshape long count, i(year) j(quarter) 

gen qdate = yq(y, q) 

egen ycount = total(count), by(year) 
egen qcount = total(count), by(quarter) 
gen half = cond(inlist(quarter, 1, 2), 1, 2)
egen hcount = total(count), by(year half) 

list if year < 1998, sepby(year) 

     +------------------------------------------------------------------+
     | year   quarter   count   qdate   ycount   qcount   half   hcount |
     |------------------------------------------------------------------|
  1. | 1996         1       2     144       14       42      1        5 |
  2. | 1996         2       3     145       14       63      1        5 |
  3. | 1996         3       4     146       14       84      2        9 |
  4. | 1996         4       5     147       14      105      2        9 |
     |------------------------------------------------------------------|
  5. | 1997         1       2     148       14       42      1        5 |
  6. | 1997         2       3     149       14       63      1        5 |
  7. | 1997         3       4     150       14       84      2        9 |
  8. | 1997         4       5     151       14      105      2        9 |
     +------------------------------------------------------------------+

How does this differ from what you did, as you already used reshape long? Principally to underline that whatever totals you want on different time scales can be generated in place. You don't need repeated collapse or different versions of the same dataset. How to tabulate, list or otherwise deal with the repetitions would be different questions.

Upvotes: 0

Related Questions