Reputation: 133
I have a quarterly dataset (over three years) at the person level with doctor name and quarter. Most people have multiple doctors each quarter. I would like to create a new variable that keeps track of the new doctors in each quarter (starting at the 5th quarter) where a new doctor is defined as not having appeared in the previous 4 quarters. So if this is what the input data looks like for one person:
| ID | Quarter | Doctor_Name |
|----|---------|-------------|
| 1 | 1 | Dr. Smith |
| 1 | 1 | Dr. Brown |
| 1 | 1 | Dr. Smith |
| 1 | 2 | Dr. Patel |
| 1 | 2 | Dr. Garcia |
| 1 | 2 | Dr. Garcia |
| 1 | 3 | Dr. Kim |
| 1 | 3 | Dr. Brown |
| 1 | 3 | Dr. Patel |
| 1 | 4 | Dr. Patel |
| 1 | 4 | Dr. Craig |
| 1 | 4 | Dr. Craig |
| 1 | 5 | Dr. Brown |
| 1 | 5 | Dr. Lee |
| 1 | 5 | Dr. Kim |
| 1 | 6 | Dr. Patel |
| 1 | 6 | Dr. Smith |
| 1 | 6 | Dr. Smith |
| 1 | 7 | Dr. Garcia |
| 1 | 7 | Dr. Smith |
| 1 | 7 | Dr. Kim |
| 1 | 8 | Dr. Lee |
| 1 | 8 | Dr. Brown |
| 1 | 8 | Dr. Smith |
This is what the output should look like:
| ID | Quarter | Doctor_Name | New_Doctor |
|----|---------|-------------|---------------|
| 1 | 1 | Dr. Smith | |
| 1 | 1 | Dr. Brown | |
| 1 | 1 | Dr. Smith | |
| 1 | 2 | Dr. Patel | |
| 1 | 2 | Dr. Garcia | |
| 1 | 2 | Dr. Garcia | |
| 1 | 3 | Dr. Kim | |
| 1 | 3 | Dr. Brown | |
| 1 | 3 | Dr. Patel | |
| 1 | 4 | Dr. Patel | |
| 1 | 4 | Dr. Craig | |
| 1 | 4 | Dr. Craig | |
| 1 | 5 | Dr. Brown | 0 |
| 1 | 5 | Dr. Lee | 1 |
| 1 | 5 | Dr. Kim | 0 |
| 1 | 6 | Dr. Patel | 0 |
| 1 | 6 | Dr. Smith | 1 |
| 1 | 6 | Dr. Smith | 1 |
| 1 | 7 | Dr. Garcia | 1 |
| 1 | 7 | Dr. Smith | 0 |
| 1 | 7 | Dr. Kim | 0 |
| 1 | 8 | Dr. Lee | 0 |
| 1 | 8 | Dr. Brown | 0 |
| 1 | 8 | Dr. Smith | 0 |
How can I create this new variable assuming that the dataset has multiple IDs? Thanks!
Upvotes: -1
Views: 36
Reputation: 37278
Please read the Stata tag wiki. It is a good idea to use dataex
for data examples and at least to try some code. (Those are probably why you got down-voted.)
Here is some technique.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id quarter) str13 doctor_name
1 1 "Dr. Smith"
1 1 "Dr. Brown"
1 1 "Dr. Smith"
1 2 "Dr. Patel"
1 2 "Dr. Garcia"
1 2 "Dr. Garcia"
1 3 "Dr. Kim"
1 3 "Dr. Brown"
1 3 "Dr. Patel"
1 4 "Dr. Patel"
1 4 "Dr. Craig"
1 4 "Dr. Craig"
1 5 "Dr. Brown"
1 5 "Dr. Lee"
1 5 "Dr. Kim"
1 6 "Dr. Patel"
1 6 "Dr. Smith"
1 6 "Dr. Smith"
1 7 "Dr. Garcia"
1 7 "Dr. Smith"
1 7 "Dr. Kim"
1 8 "Dr. Lee"
1 8 "Dr. Brown"
1 8 "Dr. Smith"
end
bysort id doctor_name (quarter) : gen new1 = _n == 1
by id doctor_name : replace new1 = 1 if new1[_n-1] == 1 & quarter == quarter[_n-1]
gen new2 = new1 if quarter > 4
sort id quarter doctor_name
list, sepby(id quarter)
+-----------------------------------------+
| id quarter doctor_n~e new1 new2 |
|-----------------------------------------|
1. | 1 1 Dr. Brown 1 . |
2. | 1 1 Dr. Smith 1 . |
3. | 1 1 Dr. Smith 1 . |
|-----------------------------------------|
4. | 1 2 Dr. Garcia 1 . |
5. | 1 2 Dr. Garcia 1 . |
6. | 1 2 Dr. Patel 1 . |
|-----------------------------------------|
7. | 1 3 Dr. Brown 0 . |
8. | 1 3 Dr. Kim 1 . |
9. | 1 3 Dr. Patel 0 . |
|-----------------------------------------|
10. | 1 4 Dr. Craig 1 . |
11. | 1 4 Dr. Craig 1 . |
12. | 1 4 Dr. Patel 0 . |
|-----------------------------------------|
13. | 1 5 Dr. Brown 0 0 |
14. | 1 5 Dr. Kim 0 0 |
15. | 1 5 Dr. Lee 1 1 |
|-----------------------------------------|
16. | 1 6 Dr. Patel 0 0 |
17. | 1 6 Dr. Smith 0 0 |
18. | 1 6 Dr. Smith 0 0 |
|-----------------------------------------|
19. | 1 7 Dr. Garcia 0 0 |
20. | 1 7 Dr. Kim 0 0 |
21. | 1 7 Dr. Smith 0 0 |
|-----------------------------------------|
22. | 1 8 Dr. Brown 0 0 |
23. | 1 8 Dr. Lee 0 0 |
24. | 1 8 Dr. Smith 0 0 |
+-----------------------------------------+
EDIT
Your desired output is just a (0, 1) variable, but the question hints at counting new doctors each quarter. To do that, each doctor must be counted just once the first time seen.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id quarter) str13 doctor_name
1 1 "Dr. Smith"
1 1 "Dr. Brown"
1 1 "Dr. Smith"
1 2 "Dr. Patel"
1 2 "Dr. Garcia"
1 2 "Dr. Garcia"
1 3 "Dr. Kim"
1 3 "Dr. Brown"
1 3 "Dr. Patel"
1 4 "Dr. Patel"
1 4 "Dr. Craig"
1 4 "Dr. Craig"
1 5 "Dr. Brown"
1 5 "Dr. Lee"
1 5 "Dr. Kim"
1 6 "Dr. Patel"
1 6 "Dr. Smith"
1 6 "Dr. Smith"
1 7 "Dr. Garcia"
1 7 "Dr. Smith"
1 7 "Dr. Kim"
1 8 "Dr. Lee"
1 8 "Dr. Brown"
1 8 "Dr. Smith"
end
bysort id doctor_name (quarter) : gen new3 = _n == 1
egen new_total = total(new3), by(id quarter)
tabdisp quarter id, c(new_total)
----------------
| id
quarter | 1
----------+-----
1 | 2
2 | 2
3 | 1
4 | 1
5 | 1
6 | 0
7 | 0
8 | 0
----------------
EDIT 2
For your problem (thanks for comments) it seems simplest just to keep track of the previous quarter the doctor appeared.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id quarter) str13 doctor_name
1 1 "Dr. Smith"
1 1 "Dr. Brown"
1 1 "Dr. Smith"
1 2 "Dr. Patel"
1 2 "Dr. Garcia"
1 2 "Dr. Garcia"
1 3 "Dr. Kim"
1 3 "Dr. Brown"
1 3 "Dr. Patel"
1 4 "Dr. Patel"
1 4 "Dr. Craig"
1 4 "Dr. Craig"
1 5 "Dr. Brown"
1 5 "Dr. Lee"
1 5 "Dr. Kim"
1 6 "Dr. Patel"
1 6 "Dr. Smith"
1 6 "Dr. Smith"
1 7 "Dr. Garcia"
1 7 "Dr. Smith"
1 7 "Dr. Kim"
1 8 "Dr. Lee"
1 8 "Dr. Brown"
1 8 "Dr. Smith"
end
bysort id doctor_name (quarter) : gen prev = quarter[_n-1] if quarter != quarter[_n-1]
by id doctor_name : replace prev = prev[_n-1] if missing(prev)
list, sepby(id doctor_name)
gen new = quarter > prev + 4
sort id quarter doctor_name
list , sepby(id quarter)
Listings
. list, sepby(id doctor_name)
+----------------------------------+
| id quarter doctor_n~e prev |
|----------------------------------|
1. | 1 1 Dr. Brown . |
2. | 1 3 Dr. Brown 1 |
3. | 1 5 Dr. Brown 3 |
4. | 1 8 Dr. Brown 5 |
|----------------------------------|
5. | 1 4 Dr. Craig . |
6. | 1 4 Dr. Craig . |
|----------------------------------|
7. | 1 2 Dr. Garcia . |
8. | 1 2 Dr. Garcia . |
9. | 1 7 Dr. Garcia 2 |
|----------------------------------|
10. | 1 3 Dr. Kim . |
11. | 1 5 Dr. Kim 3 |
12. | 1 7 Dr. Kim 5 |
|----------------------------------|
13. | 1 5 Dr. Lee . |
14. | 1 8 Dr. Lee 5 |
|----------------------------------|
15. | 1 2 Dr. Patel . |
16. | 1 3 Dr. Patel 2 |
17. | 1 4 Dr. Patel 3 |
18. | 1 6 Dr. Patel 4 |
|----------------------------------|
19. | 1 1 Dr. Smith . |
20. | 1 1 Dr. Smith . |
21. | 1 6 Dr. Smith 1 |
22. | 1 6 Dr. Smith 1 |
23. | 1 7 Dr. Smith 6 |
24. | 1 8 Dr. Smith 7 |
+----------------------------------+
.
. gen new = quarter > prev + 4
.
. sort id quarter doctor_name
.
. list , sepby(id quarter)
+----------------------------------------+
| id quarter doctor_n~e prev new |
|----------------------------------------|
1. | 1 1 Dr. Brown . 0 |
2. | 1 1 Dr. Smith . 0 |
3. | 1 1 Dr. Smith . 0 |
|----------------------------------------|
4. | 1 2 Dr. Garcia . 0 |
5. | 1 2 Dr. Garcia . 0 |
6. | 1 2 Dr. Patel . 0 |
|----------------------------------------|
7. | 1 3 Dr. Brown 1 0 |
8. | 1 3 Dr. Kim . 0 |
9. | 1 3 Dr. Patel 2 0 |
|----------------------------------------|
10. | 1 4 Dr. Craig . 0 |
11. | 1 4 Dr. Craig . 0 |
12. | 1 4 Dr. Patel 3 0 |
|----------------------------------------|
13. | 1 5 Dr. Brown 3 0 |
14. | 1 5 Dr. Kim 3 0 |
15. | 1 5 Dr. Lee . 0 |
|----------------------------------------|
16. | 1 6 Dr. Patel 4 0 |
17. | 1 6 Dr. Smith 1 1 |
18. | 1 6 Dr. Smith 1 1 |
|----------------------------------------|
19. | 1 7 Dr. Garcia 2 1 |
20. | 1 7 Dr. Kim 5 0 |
21. | 1 7 Dr. Smith 6 0 |
|----------------------------------------|
22. | 1 8 Dr. Brown 5 0 |
23. | 1 8 Dr. Lee 5 0 |
24. | 1 8 Dr. Smith 7 0 |
+----------------------------------------+
Upvotes: 0