Reputation: 33
I have a dataset which records patient’s each instance of hospital admission. Each record therefore has a patient id, admission date, and the date of discharge. Patients can be discharged and admitted again on same day to a separate hospital, separate ward or a patient can be admitted to a separate hospital before the initial recorded discharged date (nested transfer). I am trying to create a variable that calculates a continuous periods of hospital care which groups serial transfer (admission date is equal to the previous discharge date), overlapping transfer (admission date before previous discharge date) and nested transfer as a single episode of hospital stay. Here is the example data.
library (tidyverse)
library (lubridate)
record_no <- seq(1:16)
id <- c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2)
adm_date<- c("14 dec 2012", "10 jan 2013", "19 march 2013", "9 april 2013", "3 may 2013",
"24 May 2013",
"7 June 2013", "23 Jul 2013", "7 Nov 2014", "23 feb 2015", "13 March 2015",
"1 April 2015", "7 Nov 2014", "23 feb 2015", "13 March 2015",
"1 April 2015")
dis_date<- c("14 dec 2012", "19 jan 2013", "9 april 2013", "3 may 2013", "24 May 2013",
"4 June 2013", "24 Jul 2013", "7 Nov 2014", "23 feb 2015", "13 March 2015", "1 April 2015" ,
"3 april 2015", "23 feb 2015", "13 March 2015", "1 April 2015" ,
"3 april 2015")
ami_data <- data.frame(record_no, id, adm_date, dis_date, ) %>%
mutate (adm_date=dmy(adm_date), dis_date=dmy(dis_date))
I can count the number of 'transfers' or discharges per patient using group_by
by id
.
morb_seq <- ami_data %>%
arrange(id, adm_date, dis_date) %>%
group_by(id) %>%
mutate(morbseq=seq(id)) %>%
ungroup
kable(morb_seq)
> | record_no| id|adm_date |dis_date | morbseq|
> |---------:|--:|:----------|:----------|-------:|
> | 1| 1|2012-12-14 |2012-12-14 | 1|
> | 2| 1|2013-01-10 |2013-01-19 | 2|
> | 3| 1|2013-03-19 |2013-04-09 | 3|
> | 4| 1|2013-04-09 |2013-05-03 | 4|
> | 5| 1|2013-05-03 |2013-05-24 | 5|
> | 6| 1|2013-05-24 |2013-06-04 | 6|
> | 7| 1|2013-06-07 |2013-07-24 | 7|
> | 8| 1|2013-07-23 |2014-11-07 | 8|
> | 9| 1|2014-11-07 |2015-02-23 | 9|
> | 10| 1|2015-02-23 |2015-03-13 | 10|
> | 11| 1|2015-03-13 |2015-04-01 | 11|
> | 12| 1|2015-04-01 |2015-04-03 | 12|
> | 13| 2|2014-11-07 |2015-02-23 | 1|
> | 14| 2|2015-02-23 |2015-03-13 | 2|
> | 15| 2|2015-03-13 |2015-04-01 | 3|
> | 16| 2|2015-04-01 |2015-04-03 | 4|
Patient 1 was discharged 12 times, but in fact stayed four contiguous duration in hospital (four continuous period: 1, 2, 3-6, and 7-12). So what I would like to do is that for record numbers 3 to 6, the final separation date to be 2013-6-4 so that the length of stay for that particular episode (consisting four separate separation date) would be the difference between the admission date (2013-03-19) and this final separation date (2013-6-04), and for records numbers 7-12, the final discharge date to be 2015-4-01. If I group_by
id
and take the the max(dis_date)
I get 2015-4-03, which is the last discharge date for that patient overall.
How would I do this in R?
Upvotes: 3
Views: 455
Reputation: 24722
This might help you.
morb_seq %>%
group_by(id) %>%
mutate(new_stay = as.numeric(adm_date-lag(dis_date))>0,
stay_no = cumsum(if_else(is.na(new_stay),TRUE,new_stay))) %>%
group_by(id,stay_no) %>%
mutate(discharge_date=max(dis_date)) %>%
ungroup() %>%
select(!new_stay:stay_no)
Output:
record_no id adm_date dis_date morbseq discharge_date
<int> <dbl> <date> <date> <int> <date>
1 1 1 2012-12-14 2012-12-14 1 2012-12-14
2 2 1 2013-01-10 2013-01-19 2 2013-01-19
3 3 1 2013-03-19 2013-04-09 3 2013-06-04
4 4 1 2013-04-09 2013-05-03 4 2013-06-04
5 5 1 2013-05-03 2013-05-24 5 2013-06-04
6 6 1 2013-05-24 2013-06-04 6 2013-06-04
7 7 1 2013-06-07 2013-07-24 7 2015-04-03
8 8 1 2013-07-23 2014-11-07 8 2015-04-03
9 9 1 2014-11-07 2015-02-23 9 2015-04-03
10 10 1 2015-02-23 2015-03-13 10 2015-04-03
11 11 1 2015-03-13 2015-04-01 11 2015-04-03
12 12 1 2015-04-01 2015-04-03 12 2015-04-03
13 13 2 2014-11-07 2015-02-23 1 2015-04-03
14 14 2 2015-02-23 2015-03-13 2 2015-04-03
15 15 2 2015-03-13 2015-04-01 3 2015-04-03
16 16 2 2015-04-01 2015-04-03 4 2015-04-03
If you want to collapse this down to one row per every stay, you could change shorten this to:
morb_seq %>%
group_by(id) %>%
mutate(new_stay = as.numeric(adm_date-lag(dis_date))>0,
stay_no = cumsum(if_else(is.na(new_stay),TRUE,new_stay))) %>%
group_by(id,stay_no) %>%
summarize(discharge_date=max(dis_date))
Output:
id stay_no discharge_date
<dbl> <int> <date>
1 1 1 2012-12-14
2 1 2 2013-01-19
3 1 3 2013-06-04
4 1 4 2015-04-03
5 2 1 2015-04-03
Upvotes: 2