Reputation: 93
I'm working on a project and trying to create a plot of the number of open cases we have on any given date. An example of the data table is as follows.
case_files <- tibble(case_id = 1:10,
date_opened = c("2017-1-1",
"2017-1-1",
"2017-3-4",
"2017-4-4",
"2017-5-5",
"2017-5-6",
"2017-6-7",
"2017-6-6",
"2017-7-8",
"2017-7-8"),
date_closed = c("2017-4-1",
"2017-4-1",
"2017-5-4",
"2017-7-4",
"2017-7-5",
"2017-7-6",
"2017-8-7",
"2017-8-6",
"2017-9-8",
"2017-10-8"))
case_files$date_opened <- as.Date(case_files$date_opened)
case_files$date_closed <- as.Date(case_files$date_closed)
What I'm trying to do is create another data frame with the dates from the past year and the number of cases that are considered "Open" during each date. I would then be able to plot from this data frame.
daily_open_cases <- tibble(n = 0:365,
date = today() - n,
qty_open = .....)
Cases are considered Open on dates on orafter the date_opened
AND on or before the date_closed
I've considered doing conditional subsetting and then using nrow()
, but can't seem to get it to work. There must be an easier way to do this. I can do this easily in Excel using the COUNTIFS function.
Thanks!
Upvotes: 0
Views: 582
Reputation: 2535
Here's a 'tidyverse' solution, the approach is the same as the one of 42 I just used dplyr
s group_by
and mutate
instead of base-r sapply
.
library(tidyverse)
library(magrittr)
days_files <- tibble(
date = as.Date("2017-01-01")+0:365,
no_open = NA_integer_
)
days_files %<>%
group_by(date) %>%
mutate(
no_open = sum(case_files$date_opened <= date & case_files$date_closed >= date)
)
# A tibble: 366 x 2
# Groups: date [366]
date no_open
<date> <int>
1 2017-01-01 2
2 2017-01-02 2
3 2017-01-03 2
4 2017-01-04 2
5 2017-01-05 2
6 2017-01-06 2
7 2017-01-07 2
8 2017-01-08 2
9 2017-01-09 2
10 2017-01-10 2
# ... with 356 more rows
Upvotes: 0
Reputation: 263301
The Excel funtion basically does a sum of logical 1's and 0's. Easy to do in R with sum function. I'd build a structure that had all the dates and then march through those dates summing up the logical vectors using the two inequalities below across the all paired rows in the case_files
structure. The &
-function in R is vectorized:
daily_open_cases <- tibble(dt = as.Date("2017-01-01")+0:365,
qty_open = NA)
daily_open_cases$qty_open = sapply(daily_open_cases$dt,
function(d) sum(case_files$date_opened <= d & case_files$date_closed >=d) )
> head( daily_open_cases)
# A tibble: 6 x 2
dt qty_open
<date> <int>
1 2017-01-01 2
2 2017-01-02 2
3 2017-01-03 2
4 2017-01-04 2
5 2017-01-05 2
6 2017-01-06 2
>
Upvotes: 2