Reputation: 1134
I am calculating the average Arrivals and average Occupancy across hours of the day with tidyverse only.
Yet, the example above does not actually calculate the average occupancy but rather the number of people at certain time.
Yet, if I have a person coming, lets say in a hospital, at Emergency department, arriving on 10th of December, 2018 at 10 am and leaves second day at 7:45. That means the Occupancy is of a value of 1.00 patient treated from 10am all the way to 7am next day(not including 8 am and 9 am). And averaging the Occupancy over two dates, that means, Occupancy is 0.5 for all the hours from 10am the date the patient arrived, to 7 am, when the patient was discharged, on the next day, excluding 8am and 9am (mean is 0). That is the same for Arrivals, with the difference that it calculates only for the time a patient has arrived and not for all the hours they stay. This is the difference between the Occupancy and Arrivals, which seems that all answers given in my previous help requests has solved the Arrivals averages and not the Occupancy, although I have requested the Averaged Occupancy.
Here is one example that I tried to solve in the past.
Calculating Occupancy in hospital from dates with time.
Which I reproduce bellow.
df <- structure(list(ID = c(101, 102, 103, 104, 105, 106, 107), Adm =
structure(c(1326309720, 1326309900, 1328990700, 1328997240,
1329000840, 1329004440, 1329004680),
class = c("POSIXct", "POSIXt"), tzone = ""), Disc =
structure(c(1326313800, 1326317340, 1326317460, 1326324660,
1326328260, 1 326335460, 1326335460),
class = c("POSIXct", "POSIXt"), tzone = "")),
.Names = c("ID", "Adm", "Disc"),
row.names = c(NA, -7L), class = "data.frame")
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(occupancy = ifelse(last(Disc) > first(Adm) + 60*60, 1, 0))
Here is a minimalist example, which is a reproducible data type I have, for the sake of simplicity. Yet, cannot disclose any data from original data for data protection reasons.
df <- structure(list(ID = 101:103,
`Admissions <- as.POSIXct(c("2018-12-10 09:30:00",
"2018-12-10 10:15:00",
"2018-12-11 08:05:00"),
tz = "Europe/London")` =
structure(c(1544434200, 1544436900, 1544519100),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/London"),
`Discharges <- as.POSIXct(c("2018-12-10 12:30:00",
"2018-12-11 07:45:00",
"2018-12-11 09:05-00"),
tz = "Europe/London")` =
structure(c(1544445000, 1544514300, 1544519100),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/London")), row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame"))
And the expected output is:
output <- structure(list(
Hour = 0:23,
Average_arrivals = c(0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
Average_occ = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0.5, 1,
1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5)),
row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"),
spec = structure(list(cols = list(X1 =
structure(list(), class = c("collector_integer", "collector")),
Hour = structure(list(), class =c("collector_integer","collector")),
Average_arrivals = structure(list(),
class = c("collector_double", "collector")),
Average_occ = structure(list(), class = c("collector_double",
"collector"))),
default = structure(list(),
class = c("collector_guess","collector"))),
class = "col_spec"))
Upvotes: 1
Views: 308
Reputation: 66880
Here's an approach using the tidyverse. First I convert to long format using gather
, and then create a "change" column that is +1 for admissions and -1 for discharges.
Then I sum this up by hour (could be more granular, like "5 minutes" if desired) and add all the unmentioned hours by using padr:pad
(I also add extra hours on the back to make it a full set of 48 hours).
Occupancy is then the cumulative sum of the changes. By grouping by hour across the 2 days, we can get Average_arrivals and Average_occ.
Data
# Note, I could not load the sample data as provided, as the variable
# names included the desired data as text.
df <- data.frame(ID = 101:103,
Admissions = as.POSIXct(c("2018-12-10 09:30:00",
"2018-12-10 10:15:00", "2018-12-11 08:05:00")),
Discharges = as.POSIXct(c("2018-12-10 12:30:00",
"2018-12-11 07:45:00", "2018-12-11 09:05:00")))
Solution
df_flat <- df %>%
gather(status, time, Admissions:Discharges) %>%
mutate(change = if_else(status == "Admissions", 1, -1)) %>%
group_by(time_hr = lubridate::floor_date(time, "1 hour")) %>%
summarize(arrivals = sum(status == "Admissions"),
change = sum(change)) %>%
# Here, adding add'l rows so all hours have 2 instances
padr::pad(end_val = min(.$time_hr) + dhours(47)) %>%
replace_na(list(arrivals = 0, change = 0)) %>%
mutate(occupancy = cumsum(change))
output <- df_flat %>%
group_by(hour(time_hr)) %>%
summarize(Average_arrivals = mean(arrivals),
Average_occ = mean(occupancy))
Output
output
# A tibble: 24 x 3
# hour Average_arrivals Average_occ
# <int> <dbl> <dbl>
# 1 0 0 0.5
# 2 1 0 0.5
# 3 2 0 0.5
# 4 3 0 0.5
# 5 4 0 0.5
# 6 5 0 0.5
# 7 6 0 0.5
# 8 7 0 0
# 9 8 0.5 0.5
# 10 9 0.5 0.5
Upvotes: 0