Fitting survival density curves using different distributions

Question

I am working with some log-normal data, and naturally I want to demonstrate log-normal distribution results in a better overlap than other possible distributions. Essentially, I want to replicate the following graph with my data:

where the fitted density curves are juxtaposed over log(time).

The text where the linked image is from describes the process as fitting each model and obtaining the following parameters:

For that purpose, I fitted four naive survival models with the above-mentioned distributions:

survreg(Surv(time,event)~1,dist="family")

and extracted the shape parameter (α) and the coefficient (β).

I have several questions regarding the process:

1) Is this the right way of going about it? I have looked into several R packages but couldn't locate one that plots density curves as a built-in function, so I feel like I must be overlooking something obvious.

2) Do the values corresponding log-normal distribution (μ and σ$^2$) just the mean and the variance of the intercept?

3) How can I create a similar table in R? (Maybe this is more of a stack overflow question) I know I can just cbind them manually, but I am more interested in calling them from the fitted models. survreg objects store the coefficient estimates, but calling survreg.obj$coefficients results a named number vector (instead of just a number).

4) Most importantly, how can I plot a similar graph? I thought it would be fairly simple if I just extract the parameters and plot them over the histrogram, but so far no luck. The author of the text says he estimated the density curves from the parameters, but I just get a point estimate - what am I missing? Should I calculate the density curves manually based on distribution before plotting?

I am not sure how to provide a mwe in this case, but honestly I just need a general solution for adding multiple density curves to survival data. On the other hand, if you think it will help, feel free to recommend a mwe solution and I will try to produce one.

Thanks for your input!

Edit: Based on eclark's post, I have made some progress. My parameters are:

Dist = data.frame(
Exponential = rweibull(n = 10000, shape = 1, scale = 6.636684),
Weibull = rweibull(n = 10000, shape = 6.068786, scale = 2.002165),
Gamma = rgamma(n = 10000, shape = 768.1476, scale = 1433.986),
LogNormal = rlnorm(n = 10000, meanlog = 4.986, sdlog = .877)
)

However, given the massive difference in scales, this is what I get:

Going back to question number 3, is this how I should get the parameters? Currently this is how I do it (sorry for the mess):

summary(fit.exp)

Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na, 
dist = "exponential")
        Value Std. Error   z p
(Intercept)  6.64      0.052 128 0

Scale fixed at 1 

Exponential distribution
Loglik(model)= -2825.6   Loglik(intercept only)= -2825.6
Number of Newton-Raphson Iterations: 6 
n= 397 

summary(fit.wei)

Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na, 
dist = "weibull")
        Value Std. Error    z        p
(Intercept) 6.069     0.1075 56.5 0.00e+00
Log(scale)  0.694     0.0411 16.9 6.99e-64

Scale= 2 

Weibull distribution
Loglik(model)= -2622.2   Loglik(intercept only)= -2622.2
Number of Newton-Raphson Iterations: 6 
n= 397 

summary(fit.gau)

Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na, 
dist = "gaussian")
         Value Std. Error     z        p
(Intercept) 768.15    72.6174  10.6 3.77e-26
Log(scale)    7.27     0.0372 195.4 0.00e+00

Scale= 1434 

Gaussian distribution
Loglik(model)= -3243.7   Loglik(intercept only)= -3243.7
Number of Newton-Raphson Iterations: 4 
n= 397 

summary(fit.log)

Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na, 
dist = "lognormal")
        Value Std. Error    z         p
(Intercept) 4.986     0.1216 41.0  0.00e+00
Log(scale)  0.877     0.0373 23.5 1.71e-122

Scale= 2.4 

Log Normal distribution
Loglik(model)= -2624   Loglik(intercept only)= -2624
Number of Newton-Raphson Iterations: 5 
n= 397

I feel like I am particularly messing up the lognormal, given that it is not the standard shape-and-coefficient tandem but the mean and variance.

eclark · Accepted Answer

Try this; the idea is generating random variables using the random distribtion functions and then plotting the density functions with the output data, here is an example like you need:

require(ggplot2)
require(dplyr)
require(tidyr)

SampleData <- data.frame(Duration=rlnorm(n = 184,meanlog = 2.859,sdlog = .246)) #Asume this is data we have sampled from a lognormal distribution

#Then we estimate the parameters for different types of distributions for that sample data and come up for this parameters
#We then generate a dataframe with those distributions and parameters
Dist = data.frame(
  Weibull = rweibull(10000,shape = 1.995,scale = 22.386),
  Gamma = rgamma(n = 10000,shape = 4.203,scale = 4.699),
  LogNormal = rlnorm(n = 10000,meanlog = 2.859,sdlog = .246)
)

#We use gather to prepare the distribution data in a manner better suited for group plotting in ggplot2
Dist <- Dist %>% gather(Distribution,Duration)

#Create the plot that sample data as a histogram
G1 <- ggplot(SampleData,aes(x=Duration)) + geom_histogram(aes(,y=..density..),binwidth=5, colour="black", fill="white") 

#Add the density distributions of the different distributions with the estimated parameters
G2 <- G1 + geom_density(aes(x=Duration,color=Distribution),data=Dist)

plot(G2)

Fitting survival density curves using different distributions

Answers (1)

Related Questions