Reputation: 1209
I'm looking to analyze membership gaps for a group of customers, Where the effective date is where their membership began, and termination date is when their membership ended. A termination date of 2100-01-01 means a current member.
I thought a good way to do this would be to create a bit matrix from a membership database I have. I would like membership id as rows and each day as column.
The table is as follows:
member_id|effective_date|termination_date
1 | 2015-06-12 | 2015-12-19
1 | 2016-03-17 | 2016-06-23
2 | 2015-12-03 | 2100-01-01
I have it working doing multiple loops, but it takes forever. What are some ways I can make this more efficient using R. My end goal is to identify customers with less than 2 gaps of membership in the past year and the gap is less than 60 days.
Thanks for the help.
Edit: I'll want to be able change the year range. I currently have
past_year = c(seq(as.Date('2015-07-01'),as.Date('2016-06-30'),'day'))
I would consider a gap anytime a member didn't have membership during the range I'm looking at. In the example above for past_year, a gap would be any time they aren't a member between 2015-07-01 and 2016-06-30. Both members in the table would be considered to have gaps.
Upvotes: 0
Views: 155
Reputation: 11957
Perhaps a script like this.
The dplyr
library is really handy for these types of aggregations. Load that and create some example data:
library(dplyr)
data.example <- data.frame(
member_id = c(1, 1, 2, 3),
effective_date = as.Date(c('2015-06-12', '2016-03-17', '2015-12-03', '2010-01-01')),
termination_date = as.Date(c('2015-12-19', '2016-06-23', '2100-01-01', '2010-02-01'))
)
Your year range can be just two values:
past_year = as.Date(c('2015-07-01', '2016-06-30'))
Restrict your analysis to only members with dates in your range:
data.in.range <- subset(data.example,
!((termination_date < min(past_year) & effective_date < min(past_year)) |
(termination_date > max(past_year) & effective_date > max(past_year)))
)
Compute number of days that this member had membership, and report whether there was more than 1 membership period, or membership for less time than the span of your date range.
gaps <- group_by(data.in.range, member_id) %>%
summarize(
num.entries = length(member_id),
num.days = sum(termination_date - effective_date),
has.gap = num.days < abs(diff(past_year)) | num.entries > 1
)
member_id num.entries num.days has.gap
<dbl> <int> <S3: difftime> <lgl>
1 1 2 288 days TRUE
2 2 1 30710 days FALSE
I'm not sure if this is precisely what you're after, but in any event, computing off the ranges should be much faster than creating bit values for each day of the range and looping through.
Upvotes: 1