Reputation: 8247
I am working on prediction problem of churn analysis, where I have to predict whether given customer will go dormant or not. I have a variable called customer value,what I want to find is whether that value has declined trend for continuous 3 months and when it started declining.
Client_ID YearMonth Amount
ABC 201607 1000
ABC 201606 2000
ABC 201605 3000
ABC 201608 3500
ABC 201604 2500
Likewise I have customers database for 2 years and more than 10K customers in the dataset. In above dataframe Client ABC
has declining trend from 201605,201606,201607
So I want to flag this customer in 201605
and want to identify with the help of other variables that why this customer has reduced his revenue.
Desired Dataframe would be
Client_ID YearMonth Amount Decline_Flag
ABC 201607 1000 0
ABC 201606 2000 0
ABC 201605 3000 1
ABC 201608 3500 0
ABC 201604 2500 0
How can I achieve this in r ?
Upvotes: 0
Views: 271
Reputation: 173
You might consider using a running average delta against the previous measurement instead of this boolean. This method treats a minimal decrease as equivalent to a massive one, which may influence the quality of your predictions negatively down the line.
Upvotes: 1
Reputation: 60070
Here I assume that you have data available for each month, so that consecutive rows are actually a month apart. It would be relatively easy to adapt it even if that wasn't the case.
# Libraries: stringr to extract the years and months,
# dplyr and tidyr for pipe operations
library(stringr)
library(dplyr)
library(tidyr)
df = read.table(text=tab, header=TRUE)
df = df %>%
mutate(
year = as.numeric(str_sub(YearMonth, 1, 4)),
month = as.numeric(str_sub(YearMonth, 5, 6))
) %>%
arrange(Client_ID, year, month) %>%
group_by(Client_ID) %>%
mutate(decline = c(FALSE, diff(Amount) < 0),
# Look at 2 upcoming rows and see if there was a decline in
# both of them
decline3 = (lead(decline, 1, default=FALSE) & lead(decline, 2, default=FALSE)))
Then the "3 month decline" flag is stored in the decline3
column:
> df
Source: local data frame [5 x 7]
Groups: Client_ID [1]
Client_ID YearMonth Amount year month decline decline3
<fctr> <int> <int> <dbl> <dbl> <lgl> <lgl>
1 ABC 201604 2500 2016 4 FALSE FALSE
2 ABC 201605 3000 2016 5 FALSE TRUE
3 ABC 201606 2000 2016 6 TRUE FALSE
4 ABC 201607 1000 2016 7 TRUE FALSE
5 ABC 201608 3500 2016 8 FALSE FALSE
Upvotes: 1