Prajwal Mani Pradhan
Prajwal Mani Pradhan

Reputation: 397

calculating durations across variables

My data in Stata is like:

day1    day1_dt     day2    day2_dt     day3    day3_dt     day4    day4_dt     
0       2010-01-02  4       2010-01-03  .       2010-01-04  2       2010-01-05  
.       2011-05-02  3       2011-05-03  4       2011-05-04  4       2011-05-05  
5       2012-01-05  4       2012-01-06  4       2012-01-07  4       2012-01-08  
4       2015-05-02  4       2015-05-03  4       2015-05-04  4       2015-05-05  
1       2009-05-02  4       2009-05-03  0       2009-05-04  4       2009-05-05  

I want to calculate the following:

  1. Duration in days when dayX variable has 4 in them.

I think I solved the first part using following program

generate int flg1 =1 if day1 == 4
generate int flg2 =1 if day2 == 4
generate int flg3 =1 if day3 == 4
generate int flg4 =1 if day4 == 4

egen duration = rowtotal(flg*)
  1. Identify the date where the value of 4 is no more/has changed and record it in end_date

The final data would look like:

day1    day1_dt     day2    day2_dt     day3    day3_dt     day4    day4_dt     duration    end_date
0       2010-01-02  4       2010-01-03  .       2010-01-04  2       2010-01-05  1           2010-01-04          
.       2011-05-02  3       2011-05-03  4       2011-05-04  4       2011-05-05  2           .
5       2012-01-05  4       2012-01-06  4       2012-01-07  4       2012-01-08  3           .
4       2015-05-02  4       2015-05-03  4       2015-05-04  4       2015-05-05  4           .
1       2009-05-02  4       2009-05-03  0       2009-05-04  4       2009-05-05  2           .

Upvotes: 0

Views: 59

Answers (3)

Nick Cox
Nick Cox

Reputation: 37208

This is a sequel to the answer by @TheIceBear showing how to answer Question 2 while keeping the same layout.

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(day1 day2 day3 day4) float(day1_dt day2_dt day3_dt day4_dt)
0 4 . 2 18264 18265 18266 18267
. 3 4 4 18749 18750 18751 18752
5 4 4 4 18997 18998 18999 19000
4 4 4 4 20210 20211 20212 20213
1 4 0 4 18019 18020 18021 18022
end
format %tdCY-N-D day1_dt
format %tdCY-N-D day2_dt
format %tdCY-N-D day3_dt 
format %tdCY-N-D day4_dt


gen enddate = . 

* 1/4 is contingent on day*_dt running over 1 to 4 
* and on those variables being in date order 
forval j = 1/4 { 
    replace enddate = day`j'_dt if day`j' == 4 
}

egen whenlast = rowmax(day*_dt) 

replace enddate = . if enddate == whenlast 

format enddate whenlast %td 

list enddate whenlast 

Upvotes: 0

TheIceBear
TheIceBear

Reputation: 3255

There seems to be a typo in the second row in your final example. If not, then please explain why you want duration to be 1 and not 2 there.

If it was a typo, then this is the simplest way to do it. Note that it is only the last line of code that is the answer to your question.

// This is best practice way of sharing data examples in Stata on StackOverflow

* Example generated by -dataex-. For more info, type help dataex
clear
input byte day1 int day1_dt byte day2 int day2_dt byte day3 int day3_dt byte day4 int day4_dt
0 18264 4 18265 . 18266 2 18267
. 18749 3 18750 4 18751 4 18752
5 18997 4 18998 4 18999 4 19000
4 20210 4 20211 4 20212 4 20213
1 18019 4 18020 0 18021 4 18022
end
format %tdnn/dd/CCYY day1_dt
format %tdnn/dd/CCYY day2_dt
format %tdnn/dd/CCYY day3_dt
format %tdnn/dd/CCYY day4_dt

// This is your solution

* Count number of day1, day2 etc vars with value 4
egen duration = anycount(day?), values(4)

Upvotes: 1

Nick Cox
Nick Cox

Reputation: 37208

From a Stata point of view you seem to be holding panel or longitudinal data in a wide layout. As you are finding out, that makes even simple tasks quite complicated. I recommend changing to long layout with a reshape.

See the Stata tag wiki for how to give data examples as Stata code (short explanation: use the dataex command). Your example is fairly clear but requires a guess at what kind of date you have, as the display format could be YMD or YDM. I guessed one way, but the principles are the same the other way. If your date variables are really strings, you need to push them through daily() to do anything useful.

Script and output follow. You'll want to assign a display format to enddate as well.

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(day1 day2 day3 day4) float(day1_dt day2_dt day3_dt day4_dt)
0 4 . 2 18264 18265 18266 18267
. 3 4 4 18749 18750 18751 18752
5 4 4 4 18997 18998 18999 19000
4 4 4 4 20210 20211 20212 20213
1 4 0 4 18019 18020 18021 18022
end
format %tdCY-N-D day1_dt
format %tdCY-N-D day2_dt
format %tdCY-N-D day3_dt 
format %tdCY-N-D day4_dt

gen long id = _n 
reshape long day day@_dt, i(id)

egen duration = total(day == 4), by(id)

egen enddate = max(cond(day == 4, day_dt, .)), by(id)
egen whenlast = max(day_dt), by(id)
replace enddate = . if enddate == whenlast 

list, sepby(id)
. * Example generated by -dataex-. For more info, type help dataex
. clear

. input byte(day1 day2 day3 day4) float(day1_dt day2_dt day3_dt day4_dt)

         day1      day2      day3      day4    day1_dt    day2_dt    day3_dt    day4_dt
  1. 0 4 . 2 18264 18265 18266 18267
  2. . 3 4 4 18749 18750 18751 18752
  3. 5 4 4 4 18997 18998 18999 19000
  4. 4 4 4 4 20210 20211 20212 20213
  5. 1 4 0 4 18019 18020 18021 18022
  6. end

. format %tdCY-N-D day1_dt

. format %tdCY-N-D day2_dt

. format %tdCY-N-D day3_dt 

. format %tdCY-N-D day4_dt

. 
. gen long id = _n 

. reshape long day day@_dt, i(id)
(j = 1 2 3 4)

Data                               Wide   ->   Long
-----------------------------------------------------------------------------
Number of observations                5   ->   20          
Number of variables                   9   ->   4           
j variable (4 values)                     ->   _j
xij variables:
                     day1 day2 ... day4   ->   day
            day1_dt day2_dt ... day4_dt   ->   day_dt
-----------------------------------------------------------------------------

. 
. egen duration = total(day == 4), by(id)

. 
. egen enddate = max(cond(day == 4, day_dt, .)), by(id)

. egen whenlast = max(day_dt), by(id)

. replace enddate = . if enddate == whenlast 
(16 real changes made, 16 to missing)

. 
. list, sepby(id)

     +------------------------------------------------------------+
     | id   _j   day       day_dt   duration   enddate   whenlast |
     |------------------------------------------------------------|
  1. |  1    1     0   2010-01-02          1     18265      18267 |
  2. |  1    2     4   2010-01-03          1     18265      18267 |
  3. |  1    3     .   2010-01-04          1     18265      18267 |
  4. |  1    4     2   2010-01-05          1     18265      18267 |
     |------------------------------------------------------------|
  5. |  2    1     .   2011-05-02          2         .      18752 |
  6. |  2    2     3   2011-05-03          2         .      18752 |
  7. |  2    3     4   2011-05-04          2         .      18752 |
  8. |  2    4     4   2011-05-05          2         .      18752 |
     |------------------------------------------------------------|
  9. |  3    1     5   2012-01-05          3         .      19000 |
 10. |  3    2     4   2012-01-06          3         .      19000 |
 11. |  3    3     4   2012-01-07          3         .      19000 |
 12. |  3    4     4   2012-01-08          3         .      19000 |
     |------------------------------------------------------------|
 13. |  4    1     4   2015-05-02          4         .      20213 |
 14. |  4    2     4   2015-05-03          4         .      20213 |
 15. |  4    3     4   2015-05-04          4         .      20213 |
 16. |  4    4     4   2015-05-05          4         .      20213 |
     |------------------------------------------------------------|
 17. |  5    1     1   2009-05-02          2         .      18022 |
 18. |  5    2     4   2009-05-03          2         .      18022 |
 19. |  5    3     0   2009-05-04          2         .      18022 |
 20. |  5    4     4   2009-05-05          2         .      18022 |
     +------------------------------------------------------------+

Upvotes: 0

Related Questions