Reputation: 73
I have data frame df with customer names, date of joining, expiry and cohort.
names dj exp cohort
(fctr) (date) (date) (chr)
1 Tom 2011-05-01 2011-06-22 2011-05
2 David 2011-06-01 2011-07-19 2011-06
3 Jack 2011-05-03 2012-01-03 2011-05
>
names<-c("Tom","David","Jack")
dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
df<-data.frame(names,dj,exp)
df$cohort<-format(df$dj,"%Y-%m")
tbl_df(df)
And a vector DateColumns <- seq.Date(as.Date("2011/05/01"), as.Date("2015/12/1"), by = "1 month")
with calendar dates from 1st of May 2011 to 1st Dec 2015
From which I would like to check whether the customer is active in a particular calendar month or not. Active is defined as a customer's exp>DateColumns & dj<=DateColumns
Output 1 (Correct):
names dj exp cohort 2011-05-01 2011-06-01 2011-07-01 2011-08-01 .
Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE
David 2011-06-01 2011-07-19 2011-06 FALSE TRUE TRUE FALSE
Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE ....
Below is the code I wrote, unfortunately this not able to compare both expiry date and dj with the calendar date in date columns. For example David should be false in X1. So, how do I do that?
Incorrect output
names<-c("Tom","David","Jack")
dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
df<-data.frame(names,dj,exp)
df$cohort<-format(df$dj,"%Y-%m")
DateColumns <- seq.Date(as.Date("2011/05/01"), as.Date("2015/12/1"), by = "1 month")
DateColumnvalues <- t(sapply(df$exp, function(x) x > DateColumns))
df2 <- data.frame(df,DateColumnvalues)
tbl_df(df2)
output:
names dj exp cohort X1 X2 X3 X4 X5 X6
(fctr) (date) (date) (chr) (lgl) (lgl) (lgl) (lgl) (lgl) (lgl)
1 Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE FALSE FALSE
2 David 2011-06-01 2011-07-19 2011-06 **TRUE** TRUE TRUE FALSE FALSE FALSE
3 Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE TRUE TRUE
Variables not shown: X7 (lgl), X8 (lgl), X9 (lgl), X10 (lgl), X11 (lgl),
X12 (lgl), X13 (lgl), X14 (lgl), X15 (lgl), X16 (lgl), X17 (lgl), X18
(lgl), X19 (lgl), X20 (lgl), X21 (lgl), X22 (lgl), X23 (lgl), X24 (lgl),
X25 (lgl), X26 (lgl), X27 (lgl), X28 (lgl), X29 (lgl), X30 (lgl), X31
(lgl), X32 (lgl), X33 (lgl), X34 (lgl), X35 (lgl), X36 (lgl), X37 (lgl),
X38 (lgl), X39 (lgl), X40 (lgl), X41 (lgl), X42 (lgl), X43 (lgl), X44
(lgl), X45 (lgl), X46 (lgl), X47 (lgl), X48 (lgl), X49 (lgl), X50 (lgl),
X51 (lgl), X52 (lgl), X53 (lgl), X54 (lgl), X55 (lgl), X56 (lgl)
>
Note: X1 is "2011-05-01" and X2="2011-06-01" so on denoting calendar months
Second, I want to transform this data "relatively" by aggregating based on month of joining and cohort. For example, if a customer "Dick" joined in 2015-Jan and will expire in Dec 2015 his M0 should be set to true, but the M0 considered should Jan 2015 and not calendar month.
Names dj exp cohort M0 M1 M2 M3 till M55
Dick 2015-01-11 2015-12-10 2015-01 T T T T
Tom 2011-05-01 2011-06-22 2011-05 T T F F
David 2011-06-01 2011-07-19 2011-06 T T F F
Upvotes: 1
Views: 201
Reputation: 54237
As an answer to the first question, you could do
library(data.table)
library(lubridate)
dt <- data.table(df, key=c("dj", "exp"))
dates <- setDT(transform(data.frame(start = seq.Date(as.Date("2011-05-01"), as.Date("2011-08-01"), "1 month")),
end = start + months(1) - 1),
key = c("start", "end"))
dcast(foverlaps(dt, dates)[, val:=TRUE], names+dj+exp+cohort~start, value.var="val", fill=FALSE)
# names dj exp cohort 2011-05-01 2011-06-01 2011-07-01 2011-08-01
# 1: David 2011-06-01 2011-07-19 2011-06 FALSE TRUE TRUE FALSE
# 2: Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE
# 3: Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE
Concerning the 2nd question, If I understood it correctly, I'd go with
lst <- apply(df[2:3], 1, function(x) { x <- as.Date(x); as.logical(seq_along(seq(x[1], x[2], by="month"))) })
n <- max(lengths(lst))
res <- cbind(df, do.call(rbind, lapply(lst, function(x) `length<-`(x, n) )))
res[is.na(res)] <- FALSE; res
# names dj exp cohort 1 2 3 4 5 6 7 8 9
# 1 Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 2 David 2011-06-01 2011-07-19 2011-06 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 3 Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Upvotes: 1