Reputation: 164
I hope to find weibull shape and scale parameters for a distribution that is left truncated using R's fitdistr function (MLE). Using a sample of data of tree diameters (the smallest of which being 2.8):
data<-c(42.7,18.8,30.0,20.3,32.5,18.8,16.0,42.9,18.8,17.3,21.1,23.4,15.0,16.8,15.2,15.0,14.7,17.3,20.1,18.3,16.0,15.7,21.3,
19.1,17.3,17.0,17.3,17.5,21.6,15.7,12.7,13.2,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,2.8,2.8,2.8,2.8,2.8,
2.8,2.8,2.8,2.8,2.8,2.8,2.8,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,4.3,4.3,4.3,4.3,4.3,4.3,
4.3,4.3,4.3,4.3,4.3,4.3,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,
5.6,5.6,18.0,16.3,34.8,17.5,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.3,6.3,6.3,6.3,6.3,6.3,6.3,6.3,6.3,
6.3,6.3,6.3,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8)
library(MASS)
wb<-fitdistr(data,'weibull',lower=0.1) # MLE for weibull parameter determination
wb
Results:
shape scale
1.36605920 9.97356797
Given the data distribution, one would expect a negative monotonic curve (e.g. shape<1). However, these results indicate shape>1 because fitdistr did not take into account the fact that the data was left truncated. Elsewhere, the following was suggested:
ltwei<-function(x,shape,scale=1,log=FALSE){dweibull(x,shape,scale,log)/pweibull(1,shape,scale,lower=FALSE) }
ltweifit<-fitdistr(data,ltwei,start=list(shape=1,scale=10))
ltweifit
but this results in an even larger weibull shape value. How can I produce shape and scale parameters for the distribution that take the data's left truncation into account? Many thanks in advance.
Upvotes: 3
Views: 3002
Reputation: 263301
This is a modification of the advice given because I thought the denominator was incorrect. Trying to change the denominator to be 1-pweibull(trunc, ...)
See if this produces a better fit:
> ptwei <- function(x, shape, scale , log = FALSE)
+ pweibull(x, shape, scale, log)/(1-
+ pweibull(2.8, shape, scale, lower=FALSE))
> dtwei <- function(x, shape, scale , log = FALSE)
+ dweibull(x, shape, scale, log)/(1-
+ pweibull(2.8, shape, scale, lower=FALSE))
> fitdist(data, dtwei, start=list(shape=1.2, scale=4))
Fitting of the distribution ' twei ' by maximum likelihood
Parameters:
estimate Std. Error
shape 0.4555414 0.02660275
scale 0.8751571 0.12236256
I've now realized the Prof Ripley was using the lower argument to achieve what I was doing above, so the original code would work as long as the ltwei
function was called with lower = FALSE
This is following the advice of Prof. Brian Ripley found on Rhelp posting of Oct 7, 2008:
ltwei <- function(x, shape, scale = 1, log = FALSE)
dweibull(x, shape, scale, log)/
pweibull(2.8, shape, scale, lower=FALSE)
The weibull density is normalized by dividing by the CMF at the truncation point. (With lower=FALSE the pweibull function returns the integrated density from the truncation point to Inf
.)
library(MASS)
wb<-fitdistr(data,ltwei,start=list(shape=1,scale=1) )
There were 50 or more warnings (use warnings() to see the first 50)
> wb
shape scale
1.81253163 12.72912552
( 0.07199877) ( 0.54855731)
> warnings()[1:5]
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In pweibull(2.8, shape, scale, lower = FALSE) : NaNs produced
3: In dweibull(x, shape, scale, log) : NaNs produced
4: In pweibull(2.8, shape, scale, lower = FALSE) : NaNs produced
5: In dweibull(x, shape, scale, log) : NaNs produced
> library(MASS)
> wb<-fitdistr(data,ltwei,start=list(shape=1.1,scale=10) )
> wb
shape scale
1.81253113 12.72912870
( 0.07199877) ( 0.54855782)
You can see that some starting values generate warnings, but that the algorithm still succeeds. If you start with values that are closer to the "true values" the warnings don't appear. I'm afraid that your expectation was not met. Sometimes there is confusion about parametrization of the Weibull distribution because there are different conventions about how they are handled. This is what Terry Therneau has in his survival::survreg help page:
# There are multiple ways to parameterize a Weibull distribution. The survreg
# function imbeds it in a general location-scale familiy, which is a
# different parameterization than the rweibull function, and often leads
# to confusion.
# survreg's scale = 1/(rweibull shape)
# survreg's intercept = log(rweibull scale)
# For the log-likelihood all parameterizations lead to the same value.
Since you seemed unhappy with the results of fitdistr
I also ran fitdist
from the 'fitdistrplus'-package and got pretty much the same answer. I still think you need to examine the interpretation of the parameters and probably also the critical presumption that this is necessarily data that is Weibull-distributed:
dtwei <- function(x, shape, scale , log = FALSE)
dweibull(x, shape, scale, log)/
pweibull(2.8, shape, scale, lower=FALSE)
ptwei <- function(x, shape, scale , log = FALSE)
pweibull(x, shape, scale, log)/
pweibull(2.8, shape, scale, lower=FALSE)
fitdist(data, dtwei, start=list(shape=1.2, scale=4))
#----------------
Fitting of the distribution ' twei ' by maximum likelihood
Parameters:
estimate Std. Error
shape 1.812706 0.07199987
scale 12.728087 0.54839034
Upvotes: 1