Reputation: 181
ID FROM TO
1881 11/02/2013 11/02/2013
3090 09/09/2013 09/09/2013
1113 24/11/2014 06/12/2014
1110 24/07/2013 25/07/2013
111 25/06/2015 05/09/2015
If I have data.table of vacation dates, FROM and TO, I want to know how many people were on vacation for any given month.
I tried:
dt[, .N, by=.(year(FROM), month(FROM))]
but obviously it would exclude people who were on vacation across two months. ie. someone on vacation FROM JAN TO FEB would only show up in the JAN count and not the FEB count even though they are still on vacation in FEB
The output of the above code showing year, month and number is exactly what I'm looking for otherwise.
year month N
1: 2013 2 17570
2: 2013 9 16924
3: 2014 11 18809
4: 2013 7 16984
5: 2015 6 14401
6: 2015 12 10239
7: 2014 3 19346
8: 2013 5 14864
EDIT: I want every month someone is away on vacation counted. So ID 111 would be counted in June, July, August and Sept.
EDIT 2:
Running uwe's code on the full dataset produces the Total Count column below.
Subsetting the full data set for people on vacation for <= 30 days and > 30 days produces the counts in the respective columns below. These columns added to each other should equal the Total Count and therefore the DIFFERENCE should be 0 but this isn't the case.
month Total count <=30 >30 (<=30) + (>30) DIFFERENCE
01/02/2012 899 4 895 899 0
01/03/2012 3966 2320 1646 3966 0
01/04/2012 8684 6637 2086 8723 39
01/05/2012 10287 7586 2750 10336 49
01/06/2012 12018 9080 3000 12080 62
Upvotes: 0
Views: 727
Reputation: 42582
The OP has not specified what the exact rules are for counting, for instance, how to count if the same ID has multiple non-overlapping periods of vacation in the same month.
The solution below is based on the following rules:
ID
may appear in more than one row.FROM
and TO
are counted (including the FROM
and TO
months). E.g., ID
111 is counted in the months of June, July, August, and September 2015.ID
has multiple periods of vacation in one month it is only counted once.To verify that the code implements these rule, I had to enhance the sample dataset provided by the OP with additional use cases (see Data section below)
library(data.table)
library(lubridate)
# coerce dt to data.table object and character dates to class Date
setDT(dt)[, (2:3) := lapply(.SD, dmy), .SDcols = 2:3]
# for each row, create sequence of first days of months
dt[, .(month = seq(floor_date(FROM, "months"), TO, by = "months")), by = .(ID, rowid(ID))][
# count the number of unique IDs per month, order result by month
, uniqueN(ID), keyby = month]
month V1 1: 2013-02-01 1 2: 2013-07-01 1 3: 2013-09-01 2 4: 2014-11-01 1 5: 2014-12-01 1 6: 2015-06-01 1 7: 2015-07-01 1 8: 2015-08-01 1 9: 2015-09-01 1 10: 2015-11-01 1 11: 2015-12-01 1 12: 2016-06-01 1 13: 2016-07-01 1 14: 2016-08-01 1 15: 2016-09-01 1
Based on OP's sample dataset but extended by additional use cases:
library(data.table)
dt <- fread(
"ID FROM TO
1881 11/02/2013 11/02/2013
1881 23/02/2013 24/02/2013
3090 09/09/2013 09/09/2013
3091 09/09/2013 09/09/2013
1113 24/11/2014 06/12/2014
1110 24/07/2013 25/07/2013
111 25/06/2015 05/09/2015
111 25/11/2015 05/12/2015
11 25/06/2016 01/09/2016"
)
Upvotes: 2
Reputation: 79338
for the data given above, you will do:
melt(dat,1)[,value:=as.Date(sub("\\d+","20",value),"%d/%m/%Y")][,
seq(value[1],value[2],by="1 month"),by=ID][,.N,by=.(year(V1),month(V1))]
year month N
1: 2013 2 1
2: 2013 9 1
3: 2014 11 1
4: 2014 12 1
5: 2013 7 1
6: 2015 6 1
7: 2015 7 1
8: 2015 8 1
9: 2015 9 1
Upvotes: 0