RA_R
RA_R

Reputation: 3

Cox model stratifying by clinical center

I have multi center clinical trial data pooling two similarly designed studies. I am using cox regression analysis to model hazard ratio of event based on a treatment. The studies have data on clinical centers, however, the clinical centers are too numerous and small. How do I deal with this in the cox model?

Ex: coxph(outcomes ~ treatment status) This is the model not accounting for clinical centers

What would be my final model that can account for the problem of too many centers with few enrollment.

Upvotes: 0

Views: 151

Answers (1)

Tim Hirst
Tim Hirst

Reputation: 614

It sounds like you've made the assumption that the individual clinic has an impact on the 'event' you're modelling - first off, it's worth testing that assumption with yourself. If all of the clinics are treating all of the subjects identically then why use 'clinic' as a variable at all? After all, if treatments are identical then it shouldn't be a predictor which clinic it is done in!

Now, if you read that paragraph and thought to yourself "Who does this guy think he is, he knows nothing about my industry! Of course the clinic influences the likelihood of the event!" then that's great too... chances are that as you thought that there were some features of clinics that came to mind... cleanliness, presence of a certain type of expert, equipment, proximity to nearest llama, that kind of thing.

Instead of using the clinic id itself as an input variable, use the features of the clinics that you have a hypothesis about (even data science should follow the scientific method sometimes!) so your model no longer says "clinic 123's hazard is such and such". Instead it says, "clinics with piece of equipment y have a hazard of such and such", that should reduce the size of your input space and stop the overfitting / overgranularisation that you're seeing.

One final thought: be careful to pick features of the clinics themselves, and not the population they serve. Particular clinics may serve particular demographics, ethnicities, or conditions and the results be very different, but this would be a feature of the patient, not the clinic, and (I would think) therefore be controlled for by patient data, not clinic data.

Hope that helps!

Upvotes: 1

Related Questions