Reputation: 165
I have a question concerning survival analysis. However, I have the following data (just an excerpt):
Now I am trying to do Survival Analysis with Python lifelines package. For example I want to find out if T-cells influence the Overall Survival (OS). But as far as I know, I need to categorizie the numer of T cells in different categories, like e.g. High T-Cell and Low T-Cell... Is that right? But how do I find out the best fitting Cut-Out? My plan is to show, that Tumor with High T-Cells have a better survival than low T-Cells. But how could I find the best cut-off-value to discriminate between High and Low T-Cell out of the data I have here.
Does anyone has an idea? A friend of mine said something about "ROC"-Analysis but I am really confused now... I would be glad about any help!
Upvotes: 1
Views: 323
Reputation: 355
As gdrouard suggested, categorizing might not be your best option. Using a suitable time-to-event regression model (such as the Cox proportional hazards model) is usually preferable when analyzing continuous variables. The reason for this is that you are basically throwing away information if you artificially categorize it. This may also lead to bias in some scenarios.
If you want to visualize the effect of the continuous covariate on the time-to-event outcome afterwards, you may be interested in the contsurvplot
R-package (https://github.com/RobinDenz1/contsurvplot) I created. You can simply plug your regression model into one of the included plot functions and get a nice plot of the effect. More information can be found in the associated preprint: https://arxiv.org/pdf/2208.04644.pdf
Upvotes: 1
Reputation: 126
The transformation of continuous variables into categorical variables is far from obvious. A first approach can be based on the existing literature, especially in medicine/biology. A review of the existing literature may be sufficient to create these classes. Another method can be based on the empirical distribution of the T-Cells variable, sometimes highlighting an "obvious" categorization. The use of an ROC curve can be a good idea but somehow I don't think it is necessary. Categorizing your variable in Kaplan-Meier type survival analyses is necessary, but if you use Cox models there is no need to categorize this variable. So I would advise you to turn to Cox regressions to conduct your survival analysis. A Cox regression would allow you to add several predictors in your modeling as well as interaction terms, which is more convenient.
Upvotes: 1