Reputation: 19870
I know there is COXPHFIT function in MATLAB to do Cox regression, but I have problems understanding how to apply it.
1) How to compare two groups of samples with survival data in days (survdays
), censoring (cens
) and some predictor value (x
)? The groups defined by groups
logical variable. Groups have different number of samples.
2) What is the baseline parameter in coxphfit? I did read the docs, but how should I choose the baseline properly?
It would be great if you know a site with good detailed examples on medical survival data. I found only the Mathworks demo that does not even mention coxphfit.
Do you know may be another 3rd party function for Cox regression?
UPDATE: The r
tag added since the answer I've got is for R.
Upvotes: 0
Views: 3151
Reputation: 121077
With survival analysis, the hazard function is the instantaneous death rate.
In these analyses, you are typically measuring what effect something has on this hazard function. For example, you may ask "does swallowing arsenic increase the rate at which people die?". A background hazard is the level at which people would die anyway (without swallowing arsenic, in this case).
If you read the docs for coxphfit
carefully, you will notice that that function tries to calculate the baseline hazard; it is not something that you enter.
baseline The X values at which to compute the baseline hazard.
EDIT: MATLAB's coxphfit
function doesn't obviously work with grouped data. If you are happy to switch to R, then the anaylsis is a one-liner.
library(survival)
#Create some data
n <- 20;
dfr <- data.frame(
survdays = runif(n, 5, 15),
cens = runif(n) < .3,
x = rlnorm(n),
groups = rep(c("first", "second"), each = n / 2)
)
#The Cox ph analysis
summary(coxph(Surv(survdays, cens) ~ x / groups, dfr))
ANOTHER EDIT: That baseline
parameter to MATLAB's coxphfit
appears to be a normalising constant. R's coxph
function doesn't have an equivalent parameter. I looked in Statistical Computing by Michael Crawley and it seems to suggest that the baseline hazard isn't important, since it cancels out when you calculate the likelihood of your individual dying. See Chapter 33, and p615-616 in particular. My knowledge of how the model works isn't deep enough to explain the discrepancy in the MATLAB and R implementations; perhaps you could ask on the Stack Exchange Stats Analysis site.
Upvotes: 1