Reputation: 63
I realize that there is a Stata forum with this exact title, but I did not find its syntax all that helpful, especially since my datasets are a bit different. I have two datasets. One is the length of stay a person is in a facility, including the facility name. It looks like this:
+---+-------------+---------------+-----------------------+
|ID#|Entrance Date| Exit Date | Facility Name |
|1 | 7/22/2009 | 2/24/2010 | Facility 1 |
|1 | 7/10/2010 | 11/21/2010 | Facility 2 |
|2 | 3/31/2010 | 9/23/2010 | Facility 1 |
|3 | 11/24/2010 | 7/5/2011 | Facility 3 |
|4 | 3/7/2007 | 4/19/2010 | Facility 2 |
+---+-------------+---------------+-----------------------+
The next data set shows the dates when they were visited. All it has in it is ID and visitation date:
+---+-------------+
|ID#|Visit Date |
| 1 | 08/21/2009 |
| 1 | 09/02/2009 |
| 1 | 09/23/2009 |
| 3 | 04/22/2011 |
| 3 | 05/05/2011 |
+---+-------------+
I want to merge these two files together on ID#
where VisitDate
falls in between Entrance Date
and Exit Date
so that I can see 1. who had visitors, 2. what facilities they were in.
Upvotes: 2
Views: 6099
Reputation: 1051
There's a new user-written program called rangejoin
on SSC that is tailor-made for this type of problem. To install it, type in Stata's Command window:
ssc install rangejoin
rangejoin
will pair each stay based on its date in and out (the bounds of the desired interval) and the visit date. All dates have to be numeric so I pre-converted all dates to Stata dates in the examples below.
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 visit int nvisit
1 "08/21/2009" 18130
1 "09/02/2009" 18142
1 "09/23/2009" 18163
3 "04/22/2011" 18739
3 "05/05/2011" 18752
end
format %td nvisit
save "visits.dta", replace
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10(Entrance Exit Name) int(datein dateout)
1 "7/22/2009" "2/24/2010" "Facility 1" 18100 18317
1 "7/10/2010" "11/21/2010" "Facility 2" 18453 18587
2 "3/31/2010" "9/23/2010" "Facility 1" 18352 18528
3 "11/24/2010" "7/5/2011" "Facility 3" 18590 18813
4 "3/7/2007" "4/19/2010" "Facility 2" 17232 18371
end
format %td datein
format %td dateout
rangejoin nvisit datein dateout using "visits.dta", by(id)
bysort id datein: egen visit_count = total(!mi(nvisit))
list, sepby(id)
+-------------------------------------------------------------------------------------------------------+
| id Entrance Exit Name datein dateout visit nvisit visit_~t |
|-------------------------------------------------------------------------------------------------------|
1. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 08/21/2009 21aug2009 3 |
2. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 09/02/2009 02sep2009 3 |
3. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 09/23/2009 23sep2009 3 |
4. | 1 7/10/2010 11/21/2010 Facility 2 10jul2010 21nov2010 . 0 |
|-------------------------------------------------------------------------------------------------------|
5. | 2 3/31/2010 9/23/2010 Facility 1 31mar2010 23sep2010 . 0 |
|-------------------------------------------------------------------------------------------------------|
6. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 04/22/2011 22apr2011 2 |
7. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 05/05/2011 05may2011 2 |
|-------------------------------------------------------------------------------------------------------|
8. | 4 3/7/2007 4/19/2010 Facility 2 07mar2007 19apr2010 . 0 |
+-------------------------------------------------------------------------------------------------------+
You can then, if desired, revert to the original observations using:
by id datein: keep if _n == 1
keep id Entrance Exit Name datein dateout visit_count
list
+------------------------------------------------------------------------------+
| id Entrance Exit Name datein dateout visit_~t |
|------------------------------------------------------------------------------|
1. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 3 |
2. | 1 7/10/2010 11/21/2010 Facility 2 10jul2010 21nov2010 0 |
3. | 2 3/31/2010 9/23/2010 Facility 1 31mar2010 23sep2010 0 |
4. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 2 |
5. | 4 3/7/2007 4/19/2010 Facility 2 07mar2007 19apr2010 0 |
+------------------------------------------------------------------------------+
Upvotes: 9
Reputation: 1338
Another approach uses joinby
:
/* Set up Visits Data */
clear
input ID str10 Visit
1 "08/21/2009"
1 "09/02/2009"
1 "09/23/2009"
3 "04/22/2011"
3 "05/05/2011"
end
gen DateVisit = daily(Visit, "MDY")
drop Visit
tempfile Visits
save `Visits'
/* Set up Facilities Data */
clear
input ID str10 (Entrance Exit Name)
1 "7/22/2009" "2/24/2010" "Facility 1"
1 "7/10/2010" "11/21/2010" "Facility 2"
2 "3/31/2010" "9/23/2010" "Facility 1"
3 "11/24/2010" "7/5/2011" "Facility 3"
4 "3/7/2007" "4/19/2010" "Facility 2"
end
gen DateEntrance = daily(Entrance, "MDY")
gen DateExit = daily(Exit, "MDY")
drop Entrance Exit
/* Create pairwise combinations within ID using -joinby- */
joinby ID using `Visits', unmatched(both)
drop _merge
format Date* %td
/* Whatever else you want now... */
gen Visitor = 0
replace Visitor = 1 if DateEntrance <= DateVisit & DateVisit <= DateExit
* or...
collapse (sum) countVisits = Visitor, by(ID Name DateEntrance DateExit)
* or...
replace DateVisit = . if !Visitor
by ID Name (DateVisit), sort : gen VisitNumber = _n * Visitor
collapse (sum) Visitor, by(ID Name DateEntrance DateExit DateVisit VisitNumber)
drop VisitNumber
list, sepby(ID)
+---------------------------------------------------------------+
| ID Name DateEnt~e DateExit DateVisit Visitor |
|---------------------------------------------------------------|
1. | 1 Facility 1 22jul2009 24feb2010 21aug2009 1 |
2. | 1 Facility 1 22jul2009 24feb2010 02sep2009 1 |
3. | 1 Facility 1 22jul2009 24feb2010 23sep2009 1 |
4. | 1 Facility 2 10jul2010 21nov2010 . 0 |
|---------------------------------------------------------------|
5. | 2 Facility 1 31mar2010 23sep2010 . 0 |
|---------------------------------------------------------------|
6. | 3 Facility 3 24nov2010 05jul2011 22apr2011 1 |
7. | 3 Facility 3 24nov2010 05jul2011 05may2011 1 |
|---------------------------------------------------------------|
8. | 4 Facility 2 07mar2007 19apr2010 . 0 |
+---------------------------------------------------------------+
Upvotes: 1
Reputation: 37278
Any kind of merge
seems unhelpful and inappropriate here as you can only match on identifiers. I would use append
.
clear
input ID str10 (Entrance Exit) Name
1 "7/22/2009" "2/24/2010" 1
1 "7/10/2010" "11/21/2010" 2
2 "3/31/2010" "9/23/2010" 1
3 "11/24/2010" "7/5/2011" 3
4 "3/7/2007" "4/19/2010" 2
end
gen DateEntrance = daily(Entrance, "MDY")
gen DateExit = daily(Exit, "MDY")
drop Entrance Exit
sort ID, stable
by ID : gen T = _n
reshape long Date, i(ID T) j(Event) string
drop T
save Master, replace
clear
input ID str10 Visit
1 "08/21/2009"
1 "09/02/2009"
1 "09/23/2009"
3 "04/22/2011"
3 "05/05/2011"
end
gen Date = daily(Visit, "MDY")
drop Visit
gen Event = "Visit"
append using Master
sort ID Date
format Date %td
list, sepby(ID)
+----------------------------------+
| ID Date Event Name |
|----------------------------------|
1. | 1 22jul2009 Entrance 1 |
2. | 1 21aug2009 Visit . |
3. | 1 02sep2009 Visit . |
4. | 1 23sep2009 Visit . |
5. | 1 24feb2010 Exit 1 |
6. | 1 10jul2010 Entrance 2 |
7. | 1 21nov2010 Exit 2 |
|----------------------------------|
8. | 2 31mar2010 Entrance 1 |
9. | 2 23sep2010 Exit 1 |
|----------------------------------|
10. | 3 24nov2010 Entrance 3 |
11. | 3 22apr2011 Visit . |
12. | 3 05may2011 Visit . |
13. | 3 05jul2011 Exit 3 |
|----------------------------------|
14. | 4 07mar2007 Entrance 2 |
15. | 4 19apr2010 Exit 2 |
+----------------------------------+
See now here for how to fill in the missings
Upvotes: 1