P Adhikari
P Adhikari

Reputation: 33

How to calculate contiguous hospital length of stay in R

I have a dataset which records patient’s each instance of hospital admission. Each record therefore has a patient id, admission date, and the date of discharge. Patients can be discharged and admitted again on same day to a separate hospital, separate ward or a patient can be admitted to a separate hospital before the initial recorded discharged date (nested transfer). I am trying to create a variable that calculates a continuous periods of hospital care which groups serial transfer (admission date is equal to the previous discharge date), overlapping transfer (admission date before previous discharge date) and nested transfer as a single episode of hospital stay. Here is the example data.

library (tidyverse)
library (lubridate)
record_no <- seq(1:16)
id <- c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2)
adm_date<- c("14 dec 2012", "10 jan 2013", "19 march 2013", "9 april 2013", "3 may 2013", 
             "24 May 2013", 
             "7 June 2013", "23 Jul 2013", "7 Nov 2014", "23 feb 2015", "13 March 2015", 
             "1 April 2015", "7 Nov 2014", "23 feb 2015", "13 March 2015", 
             "1 April 2015")
dis_date<- c("14 dec 2012", "19 jan 2013",  "9 april 2013", "3 may 2013", "24 May 2013", 
             "4 June 2013", "24 Jul 2013", "7 Nov 2014", "23 feb 2015", "13 March 2015", "1 April 2015" ,
             "3 april 2015", "23 feb 2015", "13 March 2015", "1 April 2015" ,
             "3 april 2015")

ami_data <- data.frame(record_no, id, adm_date, dis_date, ) %>% 
  mutate (adm_date=dmy(adm_date), dis_date=dmy(dis_date)) 

I can count the number of 'transfers' or discharges per patient using group_by by id.

morb_seq <- ami_data %>% 
  arrange(id, adm_date, dis_date) %>% 
  group_by(id) %>% 
  mutate(morbseq=seq(id)) %>% 
  ungroup  
 kable(morb_seq)


>     | record_no| id|adm_date   |dis_date   | morbseq|
>     |---------:|--:|:----------|:----------|-------:|
>     |         1|  1|2012-12-14 |2012-12-14 |       1|
>     |         2|  1|2013-01-10 |2013-01-19 |       2|
>     |         3|  1|2013-03-19 |2013-04-09 |       3|
>     |         4|  1|2013-04-09 |2013-05-03 |       4|
>     |         5|  1|2013-05-03 |2013-05-24 |       5|
>     |         6|  1|2013-05-24 |2013-06-04 |       6|
>     |         7|  1|2013-06-07 |2013-07-24 |       7|
>     |         8|  1|2013-07-23 |2014-11-07 |       8|
>     |         9|  1|2014-11-07 |2015-02-23 |       9|
>     |        10|  1|2015-02-23 |2015-03-13 |      10|
>     |        11|  1|2015-03-13 |2015-04-01 |      11|
>     |        12|  1|2015-04-01 |2015-04-03 |      12|
>     |        13|  2|2014-11-07 |2015-02-23 |       1|
>     |        14|  2|2015-02-23 |2015-03-13 |       2|
>     |        15|  2|2015-03-13 |2015-04-01 |       3|
>     |        16|  2|2015-04-01 |2015-04-03 |       4|

Patient 1 was discharged 12 times, but in fact stayed four contiguous duration in hospital (four continuous period: 1, 2, 3-6, and 7-12). So what I would like to do is that for record numbers 3 to 6, the final separation date to be 2013-6-4 so that the length of stay for that particular episode (consisting four separate separation date) would be the difference between the admission date (2013-03-19) and this final separation date (2013-6-04), and for records numbers 7-12, the final discharge date to be 2015-4-01. If I group_by id and take the the max(dis_date) I get 2015-4-03, which is the last discharge date for that patient overall.

How would I do this in R?

Upvotes: 3

Views: 455

Answers (1)

langtang
langtang

Reputation: 24722

This might help you.

morb_seq %>% 
  group_by(id) %>% 
  mutate(new_stay = as.numeric(adm_date-lag(dis_date))>0,
         stay_no = cumsum(if_else(is.na(new_stay),TRUE,new_stay))) %>% 
  group_by(id,stay_no) %>%
  mutate(discharge_date=max(dis_date)) %>% 
  ungroup() %>% 
  select(!new_stay:stay_no)

Output:

   record_no    id adm_date   dis_date   morbseq discharge_date
       <int> <dbl> <date>     <date>       <int> <date>        
 1         1     1 2012-12-14 2012-12-14       1 2012-12-14    
 2         2     1 2013-01-10 2013-01-19       2 2013-01-19    
 3         3     1 2013-03-19 2013-04-09       3 2013-06-04    
 4         4     1 2013-04-09 2013-05-03       4 2013-06-04    
 5         5     1 2013-05-03 2013-05-24       5 2013-06-04    
 6         6     1 2013-05-24 2013-06-04       6 2013-06-04    
 7         7     1 2013-06-07 2013-07-24       7 2015-04-03    
 8         8     1 2013-07-23 2014-11-07       8 2015-04-03    
 9         9     1 2014-11-07 2015-02-23       9 2015-04-03    
10        10     1 2015-02-23 2015-03-13      10 2015-04-03    
11        11     1 2015-03-13 2015-04-01      11 2015-04-03    
12        12     1 2015-04-01 2015-04-03      12 2015-04-03    
13        13     2 2014-11-07 2015-02-23       1 2015-04-03    
14        14     2 2015-02-23 2015-03-13       2 2015-04-03    
15        15     2 2015-03-13 2015-04-01       3 2015-04-03    
16        16     2 2015-04-01 2015-04-03       4 2015-04-03 

If you want to collapse this down to one row per every stay, you could change shorten this to:

morb_seq %>% 
  group_by(id) %>% 
  mutate(new_stay = as.numeric(adm_date-lag(dis_date))>0,
         stay_no = cumsum(if_else(is.na(new_stay),TRUE,new_stay))) %>% 
  group_by(id,stay_no) %>%
  summarize(discharge_date=max(dis_date))

Output:

     id stay_no discharge_date
  <dbl>   <int> <date>        
1     1       1 2012-12-14    
2     1       2 2013-01-19    
3     1       3 2013-06-04    
4     1       4 2015-04-03    
5     2       1 2015-04-03  

Upvotes: 2

Related Questions