Rashmi Shivanna
Rashmi Shivanna

Reputation: 85

Predicting WHEN an event is going to occur

I am very new to Machine learning and r, so my question might seem unclear or would need more information. I have tried to explain as much as possible. Please correct me if I have used wrong terminologies or phrases. Any help on this will be greatly appreciated.

Context - I am trying to build a model to predict "when" an event is going to happen.

I have a dataset which has the below structure. This is not the actual data. It is a dummy data created to explain the scenario. Actual data cannot be shared due to confidentiality.

enter image description here

About data -

I have researched on survival models. I understand that a survival model like Cox can be used to understand the hazard function and understand how each variable can affect the time to event. I tried to use predict function with cox but I did not understand if any of the values passed to "type" parameter can be used to predict the actual time. i.e. I did not understand how I can predict the actual value for "WHEN" the limit will be crossed

May be survival model isn't the right approach for this scenario. So, please advise me of what could be the best way to approach this problem.

#define survival object 
recsurv <- Surv(time=df$ExceedanceMonth, event=df$LimitReached) 

#only for testing the code
train = subset(df,df$SubStartDate>="20150301" & df$SubEndDate<="20180401") 
test = subset(df,df$SubStartDate>"20180401") #only for testing the code

fit <- coxph(Surv(df$ExceedanceMonth, df$LimitReached) ~ df$SubDurationInMonths+df$`#subs`+df$LimitAmount+df$Monthlyutitlization+df$AvgMonthlyUtilization, train, model = TRUE)
predicted <- predict(fit, newdata = test)
head(predicted)

 1           2           3           4           5           6 
 0.75347328  0.23516619 -0.05535162 -0.03759123 -0.65658488 -0.54233043

Thank you in advance!

Upvotes: 1

Views: 1456

Answers (1)

SMzgr356
SMzgr356

Reputation: 93

Survival models are fine for what you're trying to do. (I'm assuming you've estimated the model correctly from this point on.)

The key is understanding what comes out of the model. For a Cox, the default quantity out of predict() is the linear combination (b0 + b1x1 + b2x2..., though the Cox doesn't estimate a b0). That alone won't tell you anything about when.

Specifying type="expected" for predict() will give you when via the expected duration--how long, on average, until the customer reaches his/her data limit, with the follow-up time (how long you watch the customer) set equal to the customer's actual duration (retrieved from the coxph model object).

The coxed package will also give you expected durations, calculated using a different method, without the need to worry about follow-up time. It's also a little more forgiving when it comes to inputting a newdata argument, particularly if you have a specific covariate profile in mind. See the package vignette here.

See also this thread for more on coxph.predict().

Upvotes: 1

Related Questions