clau
clau

Reputation: 93

How to perform an R equivalent of Excel's COUNTIFS function across multiple variables in a data frame

I'm working on a project and trying to create a plot of the number of open cases we have on any given date. An example of the data table is as follows.

    case_files <- tibble(case_id = 1:10,
                         date_opened = c("2017-1-1",
                                         "2017-1-1",
                                         "2017-3-4",
                                         "2017-4-4",
                                         "2017-5-5",
                                         "2017-5-6",
                                         "2017-6-7",
                                         "2017-6-6",
                                         "2017-7-8",
                                         "2017-7-8"),
                         date_closed = c("2017-4-1",
                                         "2017-4-1",
                                         "2017-5-4",
                                         "2017-7-4",
                                         "2017-7-5",
                                         "2017-7-6",
                                         "2017-8-7",
                                         "2017-8-6",
                                         "2017-9-8",
                                         "2017-10-8"))
    case_files$date_opened <- as.Date(case_files$date_opened)              
    case_files$date_closed <- as.Date(case_files$date_closed)

What I'm trying to do is create another data frame with the dates from the past year and the number of cases that are considered "Open" during each date. I would then be able to plot from this data frame.

    daily_open_cases <- tibble(n = 0:365,
                               date = today() - n,
                               qty_open = .....)

Cases are considered Open on dates on orafter the date_opened AND on or before the date_closed

I've considered doing conditional subsetting and then using nrow(), but can't seem to get it to work. There must be an easier way to do this. I can do this easily in Excel using the COUNTIFS function.

Thanks!

Upvotes: 0

Views: 582

Answers (2)

snaut
snaut

Reputation: 2535

Here's a 'tidyverse' solution, the approach is the same as the one of 42 I just used dplyrs group_by and mutate instead of base-r sapply.

library(tidyverse)
library(magrittr)

days_files <- tibble(
  date = as.Date("2017-01-01")+0:365,
  no_open = NA_integer_
)

days_files %<>% 
  group_by(date) %>%
  mutate(
    no_open = sum(case_files$date_opened <= date & case_files$date_closed >= date)
  )

    # A tibble: 366 x 2
# Groups:   date [366]
         date no_open
       <date>   <int>
 1 2017-01-01       2
 2 2017-01-02       2
 3 2017-01-03       2
 4 2017-01-04       2
 5 2017-01-05       2
 6 2017-01-06       2
 7 2017-01-07       2
 8 2017-01-08       2
 9 2017-01-09       2
10 2017-01-10       2
# ... with 356 more rows

Upvotes: 0

IRTFM
IRTFM

Reputation: 263301

The Excel funtion basically does a sum of logical 1's and 0's. Easy to do in R with sum function. I'd build a structure that had all the dates and then march through those dates summing up the logical vectors using the two inequalities below across the all paired rows in the case_files structure. The &-function in R is vectorized:

daily_open_cases <- tibble(dt = as.Date("2017-01-01")+0:365,
                                   qty_open = NA)
daily_open_cases$qty_open = sapply(daily_open_cases$dt, 
       function(d) sum(case_files$date_opened <= d & case_files$date_closed >=d) )


> head( daily_open_cases)
# A tibble: 6 x 2
          dt qty_open
      <date>    <int>
1 2017-01-01        2
2 2017-01-02        2
3 2017-01-03        2
4 2017-01-04        2
5 2017-01-05        2
6 2017-01-06        2
> 

Upvotes: 2

Related Questions