My question relates to calculating the standard deviation (SD) of transition probabilities derived from coefficients estimated through Weibull regression in Stata.
The transition probabilities are being used to model disease progression of leukemia patients over 40 cycles of 90 days (about 10 years). I need the SDs of the probabilities (which change over the run of the Markov model) to create beta distributions whose parameters can be approximated using the corresponding Markov cycle probability and its SD. These distributions are then used to do Probabilistic sensitivity analysis, i.e., they are substituted for the simple probabilities (one for each cycle) and random draws from them can evaluate the robustness of the model’s cost-effectiveness results.
Anyway, using time to event survival data, I’ve used regression analysis to estimate coefficients that can be plugged into an equation to generate transition probabilities. For example...
. streg, nohr dist(weibull)
failure _d: event
analysis time _t: time
Fitting constant-only model:
Iteration 0: log likelihood = -171.82384
Iteration 1: log likelihood = -158.78902
Iteration 2: log likelihood = -158.64499
Iteration 3: log likelihood = -158.64497
Iteration 4: log likelihood = -158.64497
Fitting full model:
Iteration 0: log likelihood = -158.64497
Weibull regression -- log relative-hazard form
No. of subjects = 93 Number of obs = 93
No. of failures = 62
Time at risk = 60250
LR chi2(0) = -0.00
Log likelihood = -158.64497 Prob > chi2 = .
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
_cons | -4.307123 .4483219 -9.61 0.000 -5.185818 -3.428429
/ln_p | -.4638212 .1020754 -4.54 0.000 -.6638854 -.263757
p | .628876 .0641928 .5148471 .7681602
1/p | 1.590139 .1623141 1.301812 1.942324
We then create the probabilities with an equation () that uses p and _cons as well as t for time (i.e., Markov cycle number) and u for cycle length (usually a year, mine is 90 days since I’m working with leukemia patients who are very likely to have an event, i.e., relapse or die).
So where lambda = p, gamma = (exp(_cons))
gen result = (exp((lambda*((t-u)^ (gamma)))-(lambda*(t^(gamma)))))
gen transitions = 1-result
Turning to the variability, I first calculate the standard errors for the coefficients
. nlcom (exp(_b[_cons])) (exp(_b[/ln_p]))
_nl_1: exp(_b[_cons])
_nl_2: exp(_b[/ln_p])
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
_nl_1 | .0116539 .0044932 2.59 0.009 .0028474 .0204604
_nl_2 | .6153864 .054186 11.36 0.000 .5091838 .721589
But what I’m really after is the standard errors on the transitions values, e.g.,
nlcom (_b[transitions])
But this doesn’t work and the book I'm using doesn't give hints on getting at this extra info. Any feedback on how to get closer would be much appreciated.
sysuse auto, clear gen u = 90 +rnormal()
set seed 1234 capture program drop _all
program define myprog , rclass
tempvar result
reg turn disp /* Here substitute your -streg- statement */
gen result' = _b[disp]*u
return scalar sd = r(sd)
bootstrap sdr = r(sd): myprog estat bootstrap, bc percentile
Of note: in the bootstrapped program, the new variable (your result) must be defined as temporary; otherwise the gen statement will lead to an error because the variable is created anew for each bootstrap replicate.
Update: 2014-03-26 I fixed the negative probabilities: I'd made an error in transcribing Emily's code. I also show that nlcom
as suggested on Statalist by Austin Nichols ( I made one correction to Austin's code.
Bootstrapping is still the key to the solution. The target quantities are probabilities calculated by a formula that is based on a nonlinear combination of estimated parameters from streg
. As the estimates are not contained in the matrix This is an ideal situation for bootstrapping. The standard approach is adopted: create a program e(b)
returned after streg
, nlcom
will not estimate the standard errors.myprog
to estimate the parameters; then bootstrap
that program.
In the example, transition probabilities pt for a range of t values are estimated. The user must set the minimum and maximum of the t
range as well as a scalar u
. Of interest, perhaps, is that , since the number of estimated parameters is variable, a foreach
statement is required inside myprog
. Also, bootstrap
requires an argument that consists of a list of estimates returned by myprog
. This list is also constructed in a foreach
/* set u and minimum and maximum times here */
scalar u = 1
local tmin = 1
local tmax = 3
set linesize 80
capture program drop _all
program define myprog , rclass
syntax anything
streg , nohr dist(weibull)
scalar lambda = exp(_b[ln_p:_cons])
scalar gamma =exp(_b[_t:_cons])
forvalues t = `1'/`2'{
scalar p`t'= 1 - ///
return scalar p`t' = p`t'
webuse cancer, clear
stset studytime, fail(died)
set seed 450811
/* set up list of returned probabilities for bootstrap */
forvalues t = `tmin'/`tmax' {
local p`t' = "p" + string(`t')
local rp`t'= "`p`t''" + "=" + "("+ "r(" + "`p`t''" +"))"
local rlist = `"`rlist' `rp`t''"'
bootstrap `rlist', nodots: myprog `tmin' `tmax'
forvalues t = `tmin'/`tmax' {
qui streg, nohr dist(weibull)
nlcom 1 - ///
(exp((exp(_b[ln_p:_cons])*((`t'-u)^(exp(_b[_t:_cons]))))- ///
Bootstrap results Number of obs = 48
Replications = 50
command: myprog 1 3
p1: r(p1)
p2: r(p2)
p3: r(p3)
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
p1 | .7009447 .0503893 13.91 0.000 .6021834 .7997059
p2 | .0187127 .007727 2.42 0.015 .0035681 .0338573
p3 | .0111243 .0047095 2.36 0.018 .0018939 .0203548
/* results of nlcom */
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
_nl_1 | .7009447 .0543671 12.89 0.000 .594387 .8075023
_nl_1 | .0187127 .0082077 2.28 0.023 .0026259 .0347995
_nl_1 | .0111243 .0049765 2.24 0.025 .0013706 .0208781
