Neil
Neil

Reputation: 8247

How to find decline trend in amount value in r

I am working on prediction problem of churn analysis, where I have to predict whether given customer will go dormant or not. I have a variable called customer value,what I want to find is whether that value has declined trend for continuous 3 months and when it started declining.

Client_ID       YearMonth       Amount
  ABC             201607         1000
  ABC             201606         2000
  ABC             201605         3000
  ABC             201608         3500
  ABC             201604         2500

Likewise I have customers database for 2 years and more than 10K customers in the dataset. In above dataframe Client ABC has declining trend from 201605,201606,201607 So I want to flag this customer in 201605 and want to identify with the help of other variables that why this customer has reduced his revenue.

Desired Dataframe would be

Client_ID       YearMonth       Amount    Decline_Flag
  ABC             201607         1000         0
  ABC             201606         2000         0
  ABC             201605         3000         1
  ABC             201608         3500         0
  ABC             201604         2500         0

How can I achieve this in r ?

Upvotes: 0

Views: 271

Answers (2)

WHoekstra
WHoekstra

Reputation: 173

You might consider using a running average delta against the previous measurement instead of this boolean. This method treats a minimal decrease as equivalent to a massive one, which may influence the quality of your predictions negatively down the line.

Upvotes: 1

Marius
Marius

Reputation: 60070

Here I assume that you have data available for each month, so that consecutive rows are actually a month apart. It would be relatively easy to adapt it even if that wasn't the case.

# Libraries: stringr to extract the years and months,
# dplyr and tidyr for pipe operations
library(stringr)
library(dplyr)
library(tidyr)

df = read.table(text=tab, header=TRUE)
df = df %>%
    mutate(
        year = as.numeric(str_sub(YearMonth, 1, 4)),
        month = as.numeric(str_sub(YearMonth, 5, 6))
    ) %>%
    arrange(Client_ID, year, month) %>%
    group_by(Client_ID) %>%
    mutate(decline = c(FALSE, diff(Amount) < 0),
           # Look at 2 upcoming rows and see if there was a decline in
           # both of them
           decline3 = (lead(decline, 1, default=FALSE) & lead(decline, 2, default=FALSE)))

Then the "3 month decline" flag is stored in the decline3 column:

> df
Source: local data frame [5 x 7]
Groups: Client_ID [1]

  Client_ID YearMonth Amount  year month decline decline3
     <fctr>     <int>  <int> <dbl> <dbl>   <lgl>    <lgl>
1       ABC    201604   2500  2016     4   FALSE    FALSE
2       ABC    201605   3000  2016     5   FALSE     TRUE
3       ABC    201606   2000  2016     6    TRUE    FALSE
4       ABC    201607   1000  2016     7    TRUE    FALSE
5       ABC    201608   3500  2016     8   FALSE    FALSE

Upvotes: 1

Related Questions