How do I count across a date range in an R datatable

Question

ID      FROM        TO
1881    11/02/2013  11/02/2013
3090    09/09/2013  09/09/2013
1113    24/11/2014  06/12/2014
1110    24/07/2013  25/07/2013
111     25/06/2015  05/09/2015

If I have data.table of vacation dates, FROM and TO, I want to know how many people were on vacation for any given month.

I tried:

dt[, .N, by=.(year(FROM), month(FROM))]

but obviously it would exclude people who were on vacation across two months. ie. someone on vacation FROM JAN TO FEB would only show up in the JAN count and not the FEB count even though they are still on vacation in FEB

The output of the above code showing year, month and number is exactly what I'm looking for otherwise.

    year month N
 1: 2013     2 17570
 2: 2013     9 16924
 3: 2014    11 18809
 4: 2013     7 16984
 5: 2015     6 14401
 6: 2015    12 10239
 7: 2014     3 19346
 8: 2013     5 14864

EDIT: I want every month someone is away on vacation counted. So ID 111 would be counted in June, July, August and Sept.

EDIT 2:

Running uwe's code on the full dataset produces the Total Count column below.
Subsetting the full data set for people on vacation for <= 30 days and > 30 days produces the counts in the respective columns below. These columns added to each other should equal the Total Count and therefore the DIFFERENCE should be 0 but this isn't the case.

month Total count <=30 >30 (<=30) + (>30) DIFFERENCE 01/02/2012 899 4 895 899 0 01/03/2012 3966 2320 1646 3966 0 01/04/2012 8684 6637 2086 8723 39 01/05/2012 10287 7586 2750 10336 49 01/06/2012 12018 9080 3000 12080 62

Uwe · Accepted Answer

The OP has not specified what the exact rules are for counting, for instance, how to count if the same ID has multiple non-overlapping periods of vacation in the same month.

The solution below is based on the following rules:

Each ID may appear in more than one row.
For each row, the total number of month between FROM and TO are counted (including the FROM and TO months). E.g., ID 111 is counted in the months of June, July, August, and September 2015.
Vacation on the last and first day of a month are accounted in full, e.g., vacations starting on May 31 and ending on June 1, are counted in both months.
If an ID has multiple periods of vacation in one month it is only counted once.

To verify that the code implements these rule, I had to enhance the sample dataset provided by the OP with additional use cases (see Data section below)

library(data.table)
library(lubridate)
# coerce dt to data.table object and character dates to class Date
setDT(dt)[, (2:3) := lapply(.SD, dmy), .SDcols = 2:3]

# for each row, create sequence of first days of months 
dt[, .(month = seq(floor_date(FROM, "months"), TO, by = "months")), by = .(ID, rowid(ID))][
  # count the number of unique IDs per month, order result by month
  , uniqueN(ID), keyby = month]

         month V1
 1: 2013-02-01  1
 2: 2013-07-01  1
 3: 2013-09-01  2
 4: 2014-11-01  1
 5: 2014-12-01  1
 6: 2015-06-01  1
 7: 2015-07-01  1
 8: 2015-08-01  1
 9: 2015-09-01  1
10: 2015-11-01  1
11: 2015-12-01  1
12: 2016-06-01  1
13: 2016-07-01  1
14: 2016-08-01  1
15: 2016-09-01  1

Data

Based on OP's sample dataset but extended by additional use cases:

library(data.table)
dt <- fread(
  "ID      FROM        TO
1881    11/02/2013  11/02/2013
1881    23/02/2013  24/02/2013
3090    09/09/2013  09/09/2013
3091    09/09/2013  09/09/2013
1113    24/11/2014  06/12/2014
1110    24/07/2013  25/07/2013
111     25/06/2015  05/09/2015
111     25/11/2015  05/12/2015
11      25/06/2016  01/09/2016"
)

How do I count across a date range in an R datatable

Answers (2)

Data

Related Questions