Reputation: 397
My data in Stata is like:
day1 day1_dt day2 day2_dt day3 day3_dt day4 day4_dt 0 2010-01-02 4 2010-01-03 . 2010-01-04 2 2010-01-05 . 2011-05-02 3 2011-05-03 4 2011-05-04 4 2011-05-05 5 2012-01-05 4 2012-01-06 4 2012-01-07 4 2012-01-08 4 2015-05-02 4 2015-05-03 4 2015-05-04 4 2015-05-05 1 2009-05-02 4 2009-05-03 0 2009-05-04 4 2009-05-05
I want to calculate the following:
I think I solved the first part using following program
generate int flg1 =1 if day1 == 4
generate int flg2 =1 if day2 == 4
generate int flg3 =1 if day3 == 4
generate int flg4 =1 if day4 == 4
egen duration = rowtotal(flg*)
The final data would look like:
day1 day1_dt day2 day2_dt day3 day3_dt day4 day4_dt duration end_date 0 2010-01-02 4 2010-01-03 . 2010-01-04 2 2010-01-05 1 2010-01-04 . 2011-05-02 3 2011-05-03 4 2011-05-04 4 2011-05-05 2 . 5 2012-01-05 4 2012-01-06 4 2012-01-07 4 2012-01-08 3 . 4 2015-05-02 4 2015-05-03 4 2015-05-04 4 2015-05-05 4 . 1 2009-05-02 4 2009-05-03 0 2009-05-04 4 2009-05-05 2 .
Upvotes: 0
Views: 59
Reputation: 37208
This is a sequel to the answer by @TheIceBear showing how to answer Question 2 while keeping the same layout.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(day1 day2 day3 day4) float(day1_dt day2_dt day3_dt day4_dt)
0 4 . 2 18264 18265 18266 18267
. 3 4 4 18749 18750 18751 18752
5 4 4 4 18997 18998 18999 19000
4 4 4 4 20210 20211 20212 20213
1 4 0 4 18019 18020 18021 18022
end
format %tdCY-N-D day1_dt
format %tdCY-N-D day2_dt
format %tdCY-N-D day3_dt
format %tdCY-N-D day4_dt
gen enddate = .
* 1/4 is contingent on day*_dt running over 1 to 4
* and on those variables being in date order
forval j = 1/4 {
replace enddate = day`j'_dt if day`j' == 4
}
egen whenlast = rowmax(day*_dt)
replace enddate = . if enddate == whenlast
format enddate whenlast %td
list enddate whenlast
Upvotes: 0
Reputation: 3255
There seems to be a typo in the second row in your final example. If not, then please explain why you want duration
to be 1 and not 2 there.
If it was a typo, then this is the simplest way to do it. Note that it is only the last line of code that is the answer to your question.
// This is best practice way of sharing data examples in Stata on StackOverflow
* Example generated by -dataex-. For more info, type help dataex
clear
input byte day1 int day1_dt byte day2 int day2_dt byte day3 int day3_dt byte day4 int day4_dt
0 18264 4 18265 . 18266 2 18267
. 18749 3 18750 4 18751 4 18752
5 18997 4 18998 4 18999 4 19000
4 20210 4 20211 4 20212 4 20213
1 18019 4 18020 0 18021 4 18022
end
format %tdnn/dd/CCYY day1_dt
format %tdnn/dd/CCYY day2_dt
format %tdnn/dd/CCYY day3_dt
format %tdnn/dd/CCYY day4_dt
// This is your solution
* Count number of day1, day2 etc vars with value 4
egen duration = anycount(day?), values(4)
Upvotes: 1
Reputation: 37208
From a Stata point of view you seem to be holding panel or longitudinal data in a wide layout. As you are finding out, that makes even simple tasks quite complicated. I recommend changing to long layout with a reshape
.
See the Stata tag wiki for how to give data examples as Stata code (short explanation: use the dataex
command). Your example is fairly clear but requires a guess at what kind of date you have, as the display format could be YMD or YDM. I guessed one way, but the principles are the same the other way. If your date variables are really strings, you need to push them through daily()
to do anything useful.
Script and output follow. You'll want to assign a display format to enddate
as well.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(day1 day2 day3 day4) float(day1_dt day2_dt day3_dt day4_dt)
0 4 . 2 18264 18265 18266 18267
. 3 4 4 18749 18750 18751 18752
5 4 4 4 18997 18998 18999 19000
4 4 4 4 20210 20211 20212 20213
1 4 0 4 18019 18020 18021 18022
end
format %tdCY-N-D day1_dt
format %tdCY-N-D day2_dt
format %tdCY-N-D day3_dt
format %tdCY-N-D day4_dt
gen long id = _n
reshape long day day@_dt, i(id)
egen duration = total(day == 4), by(id)
egen enddate = max(cond(day == 4, day_dt, .)), by(id)
egen whenlast = max(day_dt), by(id)
replace enddate = . if enddate == whenlast
list, sepby(id)
. * Example generated by -dataex-. For more info, type help dataex
. clear
. input byte(day1 day2 day3 day4) float(day1_dt day2_dt day3_dt day4_dt)
day1 day2 day3 day4 day1_dt day2_dt day3_dt day4_dt
1. 0 4 . 2 18264 18265 18266 18267
2. . 3 4 4 18749 18750 18751 18752
3. 5 4 4 4 18997 18998 18999 19000
4. 4 4 4 4 20210 20211 20212 20213
5. 1 4 0 4 18019 18020 18021 18022
6. end
. format %tdCY-N-D day1_dt
. format %tdCY-N-D day2_dt
. format %tdCY-N-D day3_dt
. format %tdCY-N-D day4_dt
.
. gen long id = _n
. reshape long day day@_dt, i(id)
(j = 1 2 3 4)
Data Wide -> Long
-----------------------------------------------------------------------------
Number of observations 5 -> 20
Number of variables 9 -> 4
j variable (4 values) -> _j
xij variables:
day1 day2 ... day4 -> day
day1_dt day2_dt ... day4_dt -> day_dt
-----------------------------------------------------------------------------
.
. egen duration = total(day == 4), by(id)
.
. egen enddate = max(cond(day == 4, day_dt, .)), by(id)
. egen whenlast = max(day_dt), by(id)
. replace enddate = . if enddate == whenlast
(16 real changes made, 16 to missing)
.
. list, sepby(id)
+------------------------------------------------------------+
| id _j day day_dt duration enddate whenlast |
|------------------------------------------------------------|
1. | 1 1 0 2010-01-02 1 18265 18267 |
2. | 1 2 4 2010-01-03 1 18265 18267 |
3. | 1 3 . 2010-01-04 1 18265 18267 |
4. | 1 4 2 2010-01-05 1 18265 18267 |
|------------------------------------------------------------|
5. | 2 1 . 2011-05-02 2 . 18752 |
6. | 2 2 3 2011-05-03 2 . 18752 |
7. | 2 3 4 2011-05-04 2 . 18752 |
8. | 2 4 4 2011-05-05 2 . 18752 |
|------------------------------------------------------------|
9. | 3 1 5 2012-01-05 3 . 19000 |
10. | 3 2 4 2012-01-06 3 . 19000 |
11. | 3 3 4 2012-01-07 3 . 19000 |
12. | 3 4 4 2012-01-08 3 . 19000 |
|------------------------------------------------------------|
13. | 4 1 4 2015-05-02 4 . 20213 |
14. | 4 2 4 2015-05-03 4 . 20213 |
15. | 4 3 4 2015-05-04 4 . 20213 |
16. | 4 4 4 2015-05-05 4 . 20213 |
|------------------------------------------------------------|
17. | 5 1 1 2009-05-02 2 . 18022 |
18. | 5 2 4 2009-05-03 2 . 18022 |
19. | 5 3 0 2009-05-04 2 . 18022 |
20. | 5 4 4 2009-05-05 2 . 18022 |
+------------------------------------------------------------+
Upvotes: 0