Sanaullah Khan
Sanaullah Khan

Reputation: 21

Truncated Maximum Likelihood estimator

I am trying to estimate the parameters of Truncated New Generalized Poisson Lindly distribution.

Here in this distribution, we use two parameters which are alpha1, beta1, When I run Optimum function then I should get the values of estimators alpha1, beta1.

I get those values in R, the value of alpha1 I get is correct, But the problem is that I am getting the incorrect value of beta1.

I have tried to run this in R, but the estimate of beta1 is incorrect.

## Data Generation
## PMF of New Generalized Poisson Lindley distribution

marginal=function(alpha1, beta1,x1){
  ab=(alpha1^2*(1+alpha1+beta1+beta1*x1))/((alpha1+beta1)*(alpha1+1)^(x1+2))
  return(ab)
}

alpha1=0.2;beta1=3;Tr=3

t1=c();
for(i in 1:10000){
  a=0:1000
  pr1=marginal(alpha1,beta1,a)
  y1=sample(a,1,replace=TRUE,prob=pr1);
  t1[i]=y1;
}
t1
##_________________###_____________________##___________
length(t1)
summary(t1)
hist(t1)
# Mean and Variance of Untruncated NGPLD

#Mean of Untruncated NGPLD 
U1=function(alpha1,beta1){
  U1=(alpha1+2*beta1)/(alpha1*(beta1+alpha1))
  return(sum(U1))
}
U1(alpha1,beta1);mean(t1)

# Variance of Untruncated NGPLD
var1=function(alpha1,beta1){
  var1=(2*beta1^2*(1+alpha1)+alpha1^2*(1+alpha1)+alpha1*beta1*(4+3*alpha1))/(alpha1^2*(alpha1+beta1)^2)
  return(sum(var1))
}
var1(alpha1,beta1);var(t1)

#Truncated data
Tx=t1[t1>Tr];Tx
hist(Tx)
length(Tx)
#Truncated NEw genralized poisson lindley Disrtribution ,where
#Truncated NGPLD=(PMF/1-Cf),where (1-Cf) is a Survival function

#survival Function
x=Tr
cf=((alpha1+beta1)*(1+alpha1)^(x+2)-(alpha1+alpha1^2+2*alpha1*beta1+alpha1*beta1*x+beta1))/((alpha1+beta1)*(1+alpha1)^(x+2))
1-cf

#Truncated NEw genralized poisson lindley Disrtribution 
UT <-function(x1,alpha1,beta1,Tr){
  x=Tr
  cf=((alpha1+beta1)*(1+alpha1)^(x+2)-(alpha1+alpha1^2+2*alpha1*beta1+alpha1*beta1*x+beta1))/((alpha1+beta1)*(1+alpha1)^(x+2))
  numerator =alpha1^2 * (1 + alpha1 + beta1 + beta1 * x1)
  denominator = (alpha1 + beta1) * (alpha1 + 1)^(x1 + 2)
  result=numerator/denominator
  result1=result/(1-cf)
  return(result1)
}
UT(4:100,0.2,3,3)
#sum of probilities of Truncated NGPLD is one
sum(UT(4:100,0.2,3,3))
plot(Tx,UT(Tx,alpha1,beta1,Tr))

#Mean and Varince of Truncated NGPLD

#mean of Truncated NGPLD
f1=function(alpha1,beta1,Tr){
  x1=Tr+1:100
  f1=(x1*alpha1^2*(1+alpha1+beta1+beta1*x1))/((alpha1+beta1)*(alpha1+1)^(x1+2))
  return(sum(f1))
}
X=f1(alpha1,beta1,Tr)/(1-cf)
X;mean(Tx)

#varince of Truncated NGPLD 

f3=function(alpha1,beta1,Tr){
  x1=Tr+1:100
  f3=(x1*(x1-1)*alpha1^2*(1+alpha1+beta1+beta1*x1))/((alpha1+beta1)*(alpha1+1)^(x1+2))
  return(sum(f3))
}

vx=f3(alpha1,beta1,Tr)/(1-cf)+f1(alpha1,beta1,Tr)/(1-cf)-(f1(alpha1,beta1,Tr)/(1-cf))^2
vx;var(Tx)



#likelihood function of Truncated NGPLD
lik=function(par,data){
  x1<-data
  alpha1=par[1];beta1=par[2];
  ll=-sum(log(UT(Tx, alpha1, beta1,Tr)))
  return(ll)}

# MLE Of Truncated NGPLD
pp=optim(par=c(alph1=1,beta1=2),fn=lik,data=Tx,
         control = list(parscale=c(alph1=1,beta1=1)))
#estimated parameter 
ap1=pp$par[1];bt1=pp$par[2];ap1;bt1


Upvotes: 1

Views: 203

Answers (1)

Sandipan Dey
Sandipan Dey

Reputation: 23109

Just changed your negative log-likelihood function to use x1 instead of Tx and it worked for me

nllik <- function(par, data, Tr) { # optionally pass additional constant param which is not an optimization variable
  x1 <- data
  alpha1 <- par[1]
  beta1 <- par[2]
  ll <- -sum(log(UT(x1, alpha1, beta1, Tr)))  # change Tx to x1
  return(ll)
}

# set.seed(1) before data generation to reproduce
pp <- optim(par = c(alph1=1, beta1=2), fn = nllik, data = Tx, Tr = Tr,
                   control = list(parscale = c(alph1=1, beta1=1)))
#estimated parameters
ap1 = pp$par[1]; bt1 = pp$par[2]
ap1; bt1
# alph1 
# 0.20076 
# beta1 
# 2.845234 

The next plot shows the negative log-likelihood function surface as a function of alpha and beta, the function optim() tries to minimize the function, starting from the initial point.

library(plot3D)
m <- 500
n <- 25
x <- seq(0.1, 0.3, length=m) 
#x <- c(seq(-1, -0.01, length=25), seq(0.01, 1, length=25))
y <- seq(2, 4, length=n)
z <- matrix(rep(0, m*n), nrow=m)
for (i in 1:length(x)) {
  for (j in 1:length(y)) {
    z[i,j] <- llik(c(x[i], y[j]), Tx, Tr)
  }
}
persp3D(x, y, z, #col=colvec, #col=jet.col(256), theta = 30, phi = 30, expand = 0.5,
        ticktype='detailed', shade=0.05, contour=TRUE, lighting=TRUE, #curtain=TRUE,
        xlab="alpha", ylab="beta",zlab="nll", main='NLL')

enter image description here

Here is the distribution with MLE fitted parameters, the density plotted on data histogram:

Tx = sort(Tx)
hist(Tx, probability = TRUE, main = 'data & MLE fit', col = 'gray')
lines(Tx,UT(Tx,ap1,bt1,Tr), col='red')
legend("topright", legend=c("data", "MLE"), lty = c(NA, 1), border = c("gray", NA), fill = c("gray",NA), col = c(NA, 'red'), x.intersp=c(-1.5,0.5))

enter image description here

EDIT

Take a closer look at the contour of the NLL function, which we are aiming to minimize:

filled.contour(x, y, z, xlab="alpha", ylab="beta",zlab="nll", main='NLL contour')

enter image description here

Note that the nllik function is very highly sensitive with the change in value of alpha, but less sensitive with the change in the value of beta beyond a point.

beta <- seq(0.1, 10,length.out=100)
alpha <- c(0.19, 0.2, 0.21)

df <- NULL
for (a in alpha) {
  for (b in beta) {
    df <- rbind(df, data.frame(alpha=a, beta=b, nll=llik(c(a,b), Tx, Tr=3)))
  }
}
head(df)
library(ggplot2)
df$alpha <- as.factor(round(df$alpha,3))
ggplot(df, aes(beta, nll, group=alpha, color=alpha)) + geom_line(lwd=1.25) + 
  scale_color_manual(values=c('red', 'green', 'blue')) +
  theme_bw()

enter image description here

Note that the function keeps on decreasing (though the rate of decrease is small) till beta=10 when alpha=0.205, that's why optim() finds high beta values for this particular value of alpha (is it as per expectation? if not, you may want to double-check the function UT() for the computation of the pmf for the TNGPL distribution from the inverse CDF)

This is a good resource to learn more about

A New Generalized Poisson-Lindley Distribution

It's good to include a reference for a distribution which is not common.

Upvotes: 2

Related Questions