Ehsan Estiri
Ehsan Estiri

Reputation: 9

Prediction limit in normal regression and survival regression

I am trying to predict the duration it takes for gas pipes to leak. I used 15 features which the most important one is “pipe installation year”. The latest leak data that I have is for a leak that happened in 2017 and that pipe was installed in 2009 I know that normal ML models that I built will not be able to do a good job in predicting the leak duration for pipes that have been installed after 2009. The reason I say it is because I first sort the data based on their “installation year” and then did a train test split to see how it functions in predicting test dataset, I got %93 R squared but when I turned the shuffle function off in train test split( which means that unlike the normal train test split which in, subsets are chosen randomly, the data will be in order of first %80 training and the last %20 testing) to see if it can predict the pipes that their ”year installed” was not in the model training, I only got %30 R squared. I know that “installation year” is a pretty important feature and the ML model can not predict the pipes that their “installation year” were not trained in the model.

I am also using survival regressions too on top of the normal ML models.I am not sure if I will have the same problem in COX PH model and other multivariate survival models too or not. Does COX PH be able to predict the hazard ratio and survival function for the pipes that were installed after 2009?

Upvotes: 0

Views: 36

Answers (1)

Oka
Oka

Reputation: 1328

Will coxph be able to predict the hazard ratio and survival function for the pipes that were installed after 2009? coxph should be able to calculate the hazard ratio and survival function for given period (it´s what it is supposed to do). You can run it and plot a KM to see if it makes sense and you can utilize the results.

Upvotes: 0

Related Questions