Reputation: 21
I am trying to estimate the parameters of Truncated New Generalized Poisson Lindly distribution.
Here in this distribution, we use two parameters which are alpha1
, beta1
, When I run Optimum function then I should get the values of estimators alpha1
, beta1
.
I get those values in R, the value of alpha1
I get is correct,
But the problem is that I am getting the incorrect value of beta1
.
I have tried to run this in R
, but the estimate of beta1
is incorrect.
## Data Generation
## PMF of New Generalized Poisson Lindley distribution
marginal=function(alpha1, beta1,x1){
ab=(alpha1^2*(1+alpha1+beta1+beta1*x1))/((alpha1+beta1)*(alpha1+1)^(x1+2))
return(ab)
}
alpha1=0.2;beta1=3;Tr=3
t1=c();
for(i in 1:10000){
a=0:1000
pr1=marginal(alpha1,beta1,a)
y1=sample(a,1,replace=TRUE,prob=pr1);
t1[i]=y1;
}
t1
##_________________###_____________________##___________
length(t1)
summary(t1)
hist(t1)
# Mean and Variance of Untruncated NGPLD
#Mean of Untruncated NGPLD
U1=function(alpha1,beta1){
U1=(alpha1+2*beta1)/(alpha1*(beta1+alpha1))
return(sum(U1))
}
U1(alpha1,beta1);mean(t1)
# Variance of Untruncated NGPLD
var1=function(alpha1,beta1){
var1=(2*beta1^2*(1+alpha1)+alpha1^2*(1+alpha1)+alpha1*beta1*(4+3*alpha1))/(alpha1^2*(alpha1+beta1)^2)
return(sum(var1))
}
var1(alpha1,beta1);var(t1)
#Truncated data
Tx=t1[t1>Tr];Tx
hist(Tx)
length(Tx)
#Truncated NEw genralized poisson lindley Disrtribution ,where
#Truncated NGPLD=(PMF/1-Cf),where (1-Cf) is a Survival function
#survival Function
x=Tr
cf=((alpha1+beta1)*(1+alpha1)^(x+2)-(alpha1+alpha1^2+2*alpha1*beta1+alpha1*beta1*x+beta1))/((alpha1+beta1)*(1+alpha1)^(x+2))
1-cf
#Truncated NEw genralized poisson lindley Disrtribution
UT <-function(x1,alpha1,beta1,Tr){
x=Tr
cf=((alpha1+beta1)*(1+alpha1)^(x+2)-(alpha1+alpha1^2+2*alpha1*beta1+alpha1*beta1*x+beta1))/((alpha1+beta1)*(1+alpha1)^(x+2))
numerator =alpha1^2 * (1 + alpha1 + beta1 + beta1 * x1)
denominator = (alpha1 + beta1) * (alpha1 + 1)^(x1 + 2)
result=numerator/denominator
result1=result/(1-cf)
return(result1)
}
UT(4:100,0.2,3,3)
#sum of probilities of Truncated NGPLD is one
sum(UT(4:100,0.2,3,3))
plot(Tx,UT(Tx,alpha1,beta1,Tr))
#Mean and Varince of Truncated NGPLD
#mean of Truncated NGPLD
f1=function(alpha1,beta1,Tr){
x1=Tr+1:100
f1=(x1*alpha1^2*(1+alpha1+beta1+beta1*x1))/((alpha1+beta1)*(alpha1+1)^(x1+2))
return(sum(f1))
}
X=f1(alpha1,beta1,Tr)/(1-cf)
X;mean(Tx)
#varince of Truncated NGPLD
f3=function(alpha1,beta1,Tr){
x1=Tr+1:100
f3=(x1*(x1-1)*alpha1^2*(1+alpha1+beta1+beta1*x1))/((alpha1+beta1)*(alpha1+1)^(x1+2))
return(sum(f3))
}
vx=f3(alpha1,beta1,Tr)/(1-cf)+f1(alpha1,beta1,Tr)/(1-cf)-(f1(alpha1,beta1,Tr)/(1-cf))^2
vx;var(Tx)
#likelihood function of Truncated NGPLD
lik=function(par,data){
x1<-data
alpha1=par[1];beta1=par[2];
ll=-sum(log(UT(Tx, alpha1, beta1,Tr)))
return(ll)}
# MLE Of Truncated NGPLD
pp=optim(par=c(alph1=1,beta1=2),fn=lik,data=Tx,
control = list(parscale=c(alph1=1,beta1=1)))
#estimated parameter
ap1=pp$par[1];bt1=pp$par[2];ap1;bt1
Upvotes: 1
Views: 203
Reputation: 23109
Just changed your negative log-likelihood
function to use x1
instead of Tx
and it worked for me
nllik <- function(par, data, Tr) { # optionally pass additional constant param which is not an optimization variable
x1 <- data
alpha1 <- par[1]
beta1 <- par[2]
ll <- -sum(log(UT(x1, alpha1, beta1, Tr))) # change Tx to x1
return(ll)
}
# set.seed(1) before data generation to reproduce
pp <- optim(par = c(alph1=1, beta1=2), fn = nllik, data = Tx, Tr = Tr,
control = list(parscale = c(alph1=1, beta1=1)))
#estimated parameters
ap1 = pp$par[1]; bt1 = pp$par[2]
ap1; bt1
# alph1
# 0.20076
# beta1
# 2.845234
The next plot shows the negative log-likelihood
function surface as a function of alpha
and beta
, the function optim()
tries to minimize the function, starting from the initial point.
library(plot3D)
m <- 500
n <- 25
x <- seq(0.1, 0.3, length=m)
#x <- c(seq(-1, -0.01, length=25), seq(0.01, 1, length=25))
y <- seq(2, 4, length=n)
z <- matrix(rep(0, m*n), nrow=m)
for (i in 1:length(x)) {
for (j in 1:length(y)) {
z[i,j] <- llik(c(x[i], y[j]), Tx, Tr)
}
}
persp3D(x, y, z, #col=colvec, #col=jet.col(256), theta = 30, phi = 30, expand = 0.5,
ticktype='detailed', shade=0.05, contour=TRUE, lighting=TRUE, #curtain=TRUE,
xlab="alpha", ylab="beta",zlab="nll", main='NLL')
Here is the distribution with MLE fitted parameters, the density plotted on data histogram:
Tx = sort(Tx)
hist(Tx, probability = TRUE, main = 'data & MLE fit', col = 'gray')
lines(Tx,UT(Tx,ap1,bt1,Tr), col='red')
legend("topright", legend=c("data", "MLE"), lty = c(NA, 1), border = c("gray", NA), fill = c("gray",NA), col = c(NA, 'red'), x.intersp=c(-1.5,0.5))
Take a closer look at the contour of the NLL
function, which we are aiming to minimize:
filled.contour(x, y, z, xlab="alpha", ylab="beta",zlab="nll", main='NLL contour')
Note that the nllik
function is very highly sensitive with the change in value of alpha
, but less sensitive with the change in the value of beta
beyond a point.
beta <- seq(0.1, 10,length.out=100)
alpha <- c(0.19, 0.2, 0.21)
df <- NULL
for (a in alpha) {
for (b in beta) {
df <- rbind(df, data.frame(alpha=a, beta=b, nll=llik(c(a,b), Tx, Tr=3)))
}
}
head(df)
library(ggplot2)
df$alpha <- as.factor(round(df$alpha,3))
ggplot(df, aes(beta, nll, group=alpha, color=alpha)) + geom_line(lwd=1.25) +
scale_color_manual(values=c('red', 'green', 'blue')) +
theme_bw()
Note that the function keeps on decreasing (though the rate of decrease is small) till beta=10
when alpha=0.205
, that's why optim()
finds high beta
values for this particular value of alpha
(is it as per expectation? if not, you may want to double-check the function UT()
for the computation of the pmf for the TNGPL
distribution from the inverse CDF)
This is a good resource to learn more about
A New Generalized Poisson-Lindley Distribution
It's good to include a reference for a distribution which is not common.
Upvotes: 2