Reputation: 63

Stata merging data sets based on a range of dates

I realize that there is a Stata forum with this exact title, but I did not find its syntax all that helpful, especially since my datasets are a bit different. I have two datasets. One is the length of stay a person is in a facility, including the facility name. It looks like this:

+---+-------------+---------------+-----------------------+
|ID#|Entrance Date|  Exit Date    | Facility Name         |
|1  | 7/22/2009   | 2/24/2010     | Facility 1            |
|1  | 7/10/2010   | 11/21/2010    | Facility 2            |
|2  | 3/31/2010   | 9/23/2010     | Facility 1            |
|3  | 11/24/2010  | 7/5/2011      | Facility 3            |
|4  | 3/7/2007    | 4/19/2010     | Facility 2            |
+---+-------------+---------------+-----------------------+

The next data set shows the dates when they were visited. All it has in it is ID and visitation date:

+---+-------------+
|ID#|Visit Date   |
| 1 | 08/21/2009  |
| 1 | 09/02/2009  |
| 1 | 09/23/2009 |
| 3 | 04/22/2011  |
| 3 | 05/05/2011  |
+---+-------------+

I want to merge these two files together on ID# where VisitDate falls in between Entrance Date and Exit Date so that I can see 1. who had visitors, 2. what facilities they were in.

Upvotes: 2

Answers (3)

Robert Picard

Reputation: 1051

There's a new user-written program called rangejoin on SSC that is tailor-made for this type of problem. To install it, type in Stata's Command window:

ssc install rangejoin

rangejoin will pair each stay based on its date in and out (the bounds of the desired interval) and the visit date. All dates have to be numeric so I pre-converted all dates to Stata dates in the examples below.

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 visit int nvisit
1 "08/21/2009" 18130
1 "09/02/2009" 18142
1 "09/23/2009" 18163
3 "04/22/2011" 18739
3 "05/05/2011" 18752
end
format %td nvisit
save "visits.dta", replace

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10(Entrance Exit Name) int(datein dateout)
1 "7/22/2009"  "2/24/2010"  "Facility 1" 18100 18317
1 "7/10/2010"  "11/21/2010" "Facility 2" 18453 18587
2 "3/31/2010"  "9/23/2010"  "Facility 1" 18352 18528
3 "11/24/2010" "7/5/2011"   "Facility 3" 18590 18813
4 "3/7/2007"   "4/19/2010"  "Facility 2" 17232 18371
end
format %td datein
format %td dateout

rangejoin nvisit datein dateout using "visits.dta", by(id)
bysort id datein: egen visit_count = total(!mi(nvisit))
list, sepby(id)

     +-------------------------------------------------------------------------------------------------------+
     | id     Entrance         Exit         Name      datein     dateout        visit      nvisit   visit_~t |
     |-------------------------------------------------------------------------------------------------------|
  1. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   08/21/2009   21aug2009          3 |
  2. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   09/02/2009   02sep2009          3 |
  3. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   09/23/2009   23sep2009          3 |
  4. |  1    7/10/2010   11/21/2010   Facility 2   10jul2010   21nov2010                        .          0 |
     |-------------------------------------------------------------------------------------------------------|
  5. |  2    3/31/2010    9/23/2010   Facility 1   31mar2010   23sep2010                        .          0 |
     |-------------------------------------------------------------------------------------------------------|
  6. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011   04/22/2011   22apr2011          2 |
  7. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011   05/05/2011   05may2011          2 |
     |-------------------------------------------------------------------------------------------------------|
  8. |  4     3/7/2007    4/19/2010   Facility 2   07mar2007   19apr2010                        .          0 |
     +-------------------------------------------------------------------------------------------------------+

You can then, if desired, revert to the original observations using:

by id datein: keep if _n == 1
keep id Entrance Exit Name datein dateout visit_count
list
     +------------------------------------------------------------------------------+
     | id     Entrance         Exit         Name      datein     dateout   visit_~t |
     |------------------------------------------------------------------------------|
  1. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010          3 |
  2. |  1    7/10/2010   11/21/2010   Facility 2   10jul2010   21nov2010          0 |
  3. |  2    3/31/2010    9/23/2010   Facility 1   31mar2010   23sep2010          0 |
  4. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011          2 |
  5. |  4     3/7/2007    4/19/2010   Facility 2   07mar2007   19apr2010          0 |
     +------------------------------------------------------------------------------+

Upvotes: 9

ander2ed

Reputation: 1338

Another approach uses joinby:

/* Set up Visits Data */
clear 
input ID str10 Visit 
1 "08/21/2009"  
1 "09/02/2009"  
1 "09/23/2009" 
3 "04/22/2011"  
3 "05/05/2011"  
end 
gen DateVisit = daily(Visit, "MDY") 
drop Visit 
tempfile Visits
save `Visits'

/* Set up Facilities Data */
clear 
input ID str10 (Entrance Exit Name)  
1 "7/22/2009"  "2/24/2010"  "Facility 1"  
1 "7/10/2010"  "11/21/2010" "Facility 2"  
2 "3/31/2010"  "9/23/2010"  "Facility 1" 
3 "11/24/2010" "7/5/2011"   "Facility 3"
4 "3/7/2007"   "4/19/2010"  "Facility 2" 
end 
gen DateEntrance = daily(Entrance, "MDY") 
gen DateExit = daily(Exit, "MDY") 
drop Entrance Exit 

/* Create pairwise combinations within ID using -joinby- */
joinby ID using `Visits', unmatched(both)
drop _merge
format Date* %td

/* Whatever else you want now... */
gen Visitor = 0
replace Visitor = 1 if DateEntrance <= DateVisit & DateVisit <= DateExit

* or... 
collapse (sum) countVisits = Visitor, by(ID Name DateEntrance DateExit)

* or...
replace DateVisit = . if !Visitor
by ID Name (DateVisit), sort : gen VisitNumber = _n * Visitor
collapse (sum) Visitor, by(ID Name DateEntrance DateExit DateVisit VisitNumber)
drop VisitNumber
list, sepby(ID)

     +---------------------------------------------------------------+
     | ID         Name   DateEnt~e    DateExit   DateVisit   Visitor |
     |---------------------------------------------------------------|
  1. |  1   Facility 1   22jul2009   24feb2010   21aug2009         1 |
  2. |  1   Facility 1   22jul2009   24feb2010   02sep2009         1 |
  3. |  1   Facility 1   22jul2009   24feb2010   23sep2009         1 |
  4. |  1   Facility 2   10jul2010   21nov2010           .         0 |
     |---------------------------------------------------------------|
  5. |  2   Facility 1   31mar2010   23sep2010           .         0 |
     |---------------------------------------------------------------|
  6. |  3   Facility 3   24nov2010   05jul2011   22apr2011         1 |
  7. |  3   Facility 3   24nov2010   05jul2011   05may2011         1 |
     |---------------------------------------------------------------|
  8. |  4   Facility 2   07mar2007   19apr2010           .         0 |
     +---------------------------------------------------------------+

Upvotes: 1

Nick Cox

Reputation: 37278

Any kind of merge seems unhelpful and inappropriate here as you can only match on identifiers. I would use append.

clear 
input ID str10 (Entrance Exit) Name  
1 "7/22/2009"  "2/24/2010"  1  
1 "7/10/2010"  "11/21/2010" 2  
2 "3/31/2010"  "9/23/2010"  1  
3 "11/24/2010" "7/5/2011"   3  
4 "3/7/2007"   "4/19/2010"  2 
end 
gen DateEntrance = daily(Entrance, "MDY") 
gen DateExit = daily(Exit, "MDY") 
drop Entrance Exit 
sort ID, stable 
by ID : gen T = _n 
reshape long Date, i(ID T) j(Event) string 
drop T 
save Master, replace 
clear 
input ID str10 Visit 
1 "08/21/2009"  
1 "09/02/2009"  
1 "09/23/2009" 
3 "04/22/2011"  
3 "05/05/2011"  
end 
gen Date = daily(Visit, "MDY") 
drop Visit 
gen Event = "Visit" 
append using Master 
sort ID Date 
format Date %td 
list, sepby(ID)  

    +----------------------------------+
     | ID        Date      Event   Name |
     |----------------------------------|
  1. |  1   22jul2009   Entrance      1 |
  2. |  1   21aug2009      Visit      . |
  3. |  1   02sep2009      Visit      . |
  4. |  1   23sep2009      Visit      . |
  5. |  1   24feb2010       Exit      1 |
  6. |  1   10jul2010   Entrance      2 |
  7. |  1   21nov2010       Exit      2 |
     |----------------------------------|
  8. |  2   31mar2010   Entrance      1 |
  9. |  2   23sep2010       Exit      1 |
     |----------------------------------|
 10. |  3   24nov2010   Entrance      3 |
 11. |  3   22apr2011      Visit      . |
 12. |  3   05may2011      Visit      . |
 13. |  3   05jul2011       Exit      3 |
     |----------------------------------|
 14. |  4   07mar2007   Entrance      2 |
 15. |  4   19apr2010       Exit      2 |
     +----------------------------------+

See now here for how to fill in the missings

Upvotes: 1

Stata merging data sets based on a range of dates

Answers (3)

Related Questions