ciflikli
ciflikli

Reputation: 31

Fitting survival density curves using different distributions

I am working with some log-normal data, and naturally I want to demonstrate log-normal distribution results in a better overlap than other possible distributions. Essentially, I want to replicate the following graph with my data:

fitted density curves

where the fitted density curves are juxtaposed over log(time).

The text where the linked image is from describes the process as fitting each model and obtaining the following parameters:

table

For that purpose, I fitted four naive survival models with the above-mentioned distributions:

survreg(Surv(time,event)~1,dist="family")

and extracted the shape parameter (α) and the coefficient (β).

I have several questions regarding the process:

1) Is this the right way of going about it? I have looked into several R packages but couldn't locate one that plots density curves as a built-in function, so I feel like I must be overlooking something obvious.

2) Do the values corresponding log-normal distribution (μ and σ$^2$) just the mean and the variance of the intercept?

3) How can I create a similar table in R? (Maybe this is more of a stack overflow question) I know I can just cbind them manually, but I am more interested in calling them from the fitted models. survreg objects store the coefficient estimates, but calling survreg.obj$coefficients results a named number vector (instead of just a number).

4) Most importantly, how can I plot a similar graph? I thought it would be fairly simple if I just extract the parameters and plot them over the histrogram, but so far no luck. The author of the text says he estimated the density curves from the parameters, but I just get a point estimate - what am I missing? Should I calculate the density curves manually based on distribution before plotting?

I am not sure how to provide a mwe in this case, but honestly I just need a general solution for adding multiple density curves to survival data. On the other hand, if you think it will help, feel free to recommend a mwe solution and I will try to produce one.

Thanks for your input!

Edit: Based on eclark's post, I have made some progress. My parameters are:

Dist = data.frame(
Exponential = rweibull(n = 10000, shape = 1, scale = 6.636684),
Weibull = rweibull(n = 10000, shape = 6.068786, scale = 2.002165),
Gamma = rgamma(n = 10000, shape = 768.1476, scale = 1433.986),
LogNormal = rlnorm(n = 10000, meanlog = 4.986, sdlog = .877)
)

However, given the massive difference in scales, this is what I get:

plot

Going back to question number 3, is this how I should get the parameters? Currently this is how I do it (sorry for the mess):

summary(fit.exp)

Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na, 
dist = "exponential")
        Value Std. Error   z p
(Intercept)  6.64      0.052 128 0

Scale fixed at 1 

Exponential distribution
Loglik(model)= -2825.6   Loglik(intercept only)= -2825.6
Number of Newton-Raphson Iterations: 6 
n= 397 

summary(fit.wei)

Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na, 
dist = "weibull")
        Value Std. Error    z        p
(Intercept) 6.069     0.1075 56.5 0.00e+00
Log(scale)  0.694     0.0411 16.9 6.99e-64

Scale= 2 

Weibull distribution
Loglik(model)= -2622.2   Loglik(intercept only)= -2622.2
Number of Newton-Raphson Iterations: 6 
n= 397 

summary(fit.gau)

Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na, 
dist = "gaussian")
         Value Std. Error     z        p
(Intercept) 768.15    72.6174  10.6 3.77e-26
Log(scale)    7.27     0.0372 195.4 0.00e+00

Scale= 1434 

Gaussian distribution
Loglik(model)= -3243.7   Loglik(intercept only)= -3243.7
Number of Newton-Raphson Iterations: 4 
n= 397 

summary(fit.log)

Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na, 
dist = "lognormal")
        Value Std. Error    z         p
(Intercept) 4.986     0.1216 41.0  0.00e+00
Log(scale)  0.877     0.0373 23.5 1.71e-122

Scale= 2.4 

Log Normal distribution
Loglik(model)= -2624   Loglik(intercept only)= -2624
Number of Newton-Raphson Iterations: 5 
n= 397 

I feel like I am particularly messing up the lognormal, given that it is not the standard shape-and-coefficient tandem but the mean and variance.

Upvotes: 3

Views: 1353

Answers (1)

eclark
eclark

Reputation: 819

Try this; the idea is generating random variables using the random distribtion functions and then plotting the density functions with the output data, here is an example like you need:

require(ggplot2)
require(dplyr)
require(tidyr)

SampleData <- data.frame(Duration=rlnorm(n = 184,meanlog = 2.859,sdlog = .246)) #Asume this is data we have sampled from a lognormal distribution

#Then we estimate the parameters for different types of distributions for that sample data and come up for this parameters
#We then generate a dataframe with those distributions and parameters
Dist = data.frame(
  Weibull = rweibull(10000,shape = 1.995,scale = 22.386),
  Gamma = rgamma(n = 10000,shape = 4.203,scale = 4.699),
  LogNormal = rlnorm(n = 10000,meanlog = 2.859,sdlog = .246)
)

#We use gather to prepare the distribution data in a manner better suited for group plotting in ggplot2
Dist <- Dist %>% gather(Distribution,Duration)

#Create the plot that sample data as a histogram
G1 <- ggplot(SampleData,aes(x=Duration)) + geom_histogram(aes(,y=..density..),binwidth=5, colour="black", fill="white") 

#Add the density distributions of the different distributions with the estimated parameters
G2 <- G1 + geom_density(aes(x=Duration,color=Distribution),data=Dist)

plot(G2)

enter image description here

Upvotes: 2

Related Questions