treetopdewdrop
treetopdewdrop

Reputation: 164

How to determine weibull parameters for a left truncated distribution using fitdistr?

I hope to find weibull shape and scale parameters for a distribution that is left truncated using R's fitdistr function (MLE). Using a sample of data of tree diameters (the smallest of which being 2.8):

data<-c(42.7,18.8,30.0,20.3,32.5,18.8,16.0,42.9,18.8,17.3,21.1,23.4,15.0,16.8,15.2,15.0,14.7,17.3,20.1,18.3,16.0,15.7,21.3,
    19.1,17.3,17.0,17.3,17.5,21.6,15.7,12.7,13.2,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,2.8,2.8,2.8,2.8,2.8,
    2.8,2.8,2.8,2.8,2.8,2.8,2.8,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,4.3,4.3,4.3,4.3,4.3,4.3,
    4.3,4.3,4.3,4.3,4.3,4.3,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,
    5.6,5.6,18.0,16.3,34.8,17.5,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.3,6.3,6.3,6.3,6.3,6.3,6.3,6.3,6.3,
    6.3,6.3,6.3,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8)

library(MASS)
wb<-fitdistr(data,'weibull',lower=0.1) # MLE for weibull parameter determination
wb

Results: shape scale
1.36605920 9.97356797

Given the data distribution, one would expect a negative monotonic curve (e.g. shape<1). However, these results indicate shape>1 because fitdistr did not take into account the fact that the data was left truncated. Elsewhere, the following was suggested:

ltwei<-function(x,shape,scale=1,log=FALSE){dweibull(x,shape,scale,log)/pweibull(1,shape,scale,lower=FALSE) } 
ltweifit<-fitdistr(data,ltwei,start=list(shape=1,scale=10)) 
ltweifit 

but this results in an even larger weibull shape value. How can I produce shape and scale parameters for the distribution that take the data's left truncation into account? Many thanks in advance.

Upvotes: 3

Views: 3002

Answers (1)

IRTFM
IRTFM

Reputation: 263301

This is a modification of the advice given because I thought the denominator was incorrect. Trying to change the denominator to be 1-pweibull(trunc, ...) See if this produces a better fit:

> ptwei <- function(x, shape, scale , log = FALSE) 
+               pweibull(x, shape, scale, log)/(1-
+                 pweibull(2.8, shape, scale, lower=FALSE))
> dtwei <- function(x, shape, scale , log = FALSE) 
+               dweibull(x, shape, scale, log)/(1-
+                 pweibull(2.8, shape, scale, lower=FALSE))
> fitdist(data, dtwei, start=list(shape=1.2, scale=4))
Fitting of the distribution ' twei ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape 0.4555414 0.02660275
scale 0.8751571 0.12236256

I've now realized the Prof Ripley was using the lower argument to achieve what I was doing above, so the original code would work as long as the ltwei function was called with lower = FALSE

-----------

This is following the advice of Prof. Brian Ripley found on Rhelp posting of Oct 7, 2008:

ltwei <- function(x, shape, scale = 1, log = FALSE) 
              dweibull(x, shape, scale, log)/
                pweibull(2.8, shape, scale, lower=FALSE)

The weibull density is normalized by dividing by the CMF at the truncation point. (With lower=FALSE the pweibull function returns the integrated density from the truncation point to Inf.)

library(MASS)
wb<-fitdistr(data,ltwei,start=list(shape=1,scale=1) ) 

There were 50 or more warnings (use warnings() to see the first 50)
> wb
      shape         scale   
   1.81253163   12.72912552 
 ( 0.07199877) ( 0.54855731)
> warnings()[1:5]
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In pweibull(2.8, shape, scale, lower = FALSE) : NaNs produced
3: In dweibull(x, shape, scale, log) : NaNs produced
4: In pweibull(2.8, shape, scale, lower = FALSE) : NaNs produced
5: In dweibull(x, shape, scale, log) : NaNs produced
> library(MASS)
> wb<-fitdistr(data,ltwei,start=list(shape=1.1,scale=10) ) 
> wb
      shape         scale   
   1.81253113   12.72912870 
 ( 0.07199877) ( 0.54855782)

You can see that some starting values generate warnings, but that the algorithm still succeeds. If you start with values that are closer to the "true values" the warnings don't appear. I'm afraid that your expectation was not met. Sometimes there is confusion about parametrization of the Weibull distribution because there are different conventions about how they are handled. This is what Terry Therneau has in his survival::survreg help page:

# There are multiple ways to parameterize a Weibull distribution. The survreg 
# function imbeds it in a general location-scale familiy, which is a 
# different parameterization than the rweibull function, and often leads
# to confusion.
#   survreg's scale  =    1/(rweibull shape)
#   survreg's intercept = log(rweibull scale)
#   For the log-likelihood all parameterizations lead to the same value.

Since you seemed unhappy with the results of fitdistr I also ran fitdist from the 'fitdistrplus'-package and got pretty much the same answer. I still think you need to examine the interpretation of the parameters and probably also the critical presumption that this is necessarily data that is Weibull-distributed:

 dtwei <- function(x, shape, scale , log = FALSE) 
               dweibull(x, shape, scale, log)/
                 pweibull(2.8, shape, scale, lower=FALSE)
 
ptwei <- function(x, shape, scale , log = FALSE) 
               pweibull(x, shape, scale, log)/
                 pweibull(2.8, shape, scale, lower=FALSE)
fitdist(data, dtwei, start=list(shape=1.2, scale=4))
#----------------
Fitting of the distribution ' twei ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape  1.812706 0.07199987
scale 12.728087 0.54839034

Upvotes: 1

Related Questions