Jasmine Helen
Jasmine Helen

Reputation: 99

R: The diferrence of estimate parameter between GLM model and optim() package

My purpose is to find estimate parameters with optim() package in R. And I compare my result with GLM model in R. The code is

d <- read.delim("http://dnett.github.io/S510/Disease.txt")
d$disease=factor(d$disease)
d$ses=factor(d$ses)
d$sector=factor(d$sector)
str(d)

oreduced <- glm(disease~age+sector, family=binomial(link=logit), data=d)
summary(oreduced)

y<-as.numeric(as.character(d$disease))
x1<-as.numeric(as.character(d$age))
x2<-as.numeric(as.character(d$sector))

nlldbin=function(param){
  eta<-param[1]+param[2]*x1+param[3]*x2
  p<-1/(1+exp(-eta))
  -sum(y*log(p)+(1-y)*log(1-p),na.rm=TRUE)
}
MLE_estimates<-optim(c(Intercept=0.1,age=0.1,sector2=0.1),nlldbin,hessian=TRUE)

MLE_estimatesenter

Wih GlM the result is

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.15966    0.34388  -6.280 3.38e-10 ***
age          0.02681    0.00865   3.100 0.001936 ** 
sector2      1.18169    0.33696   3.507 0.000453 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And with optim()

$par
  Intercept         age     sector2 
-3.34005918  0.02680405  1.18101449 

Someone please tell me why it's different? and how to fix this? Thank you

Upvotes: 0

Views: 74

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269654

x2 in the optim version has the wrong coding. Try:

nlldbin <- function(param) {
  eta <- param[1] + param[2] * x1 + param[3] * (x2 == 2)
  p <- 1 / (1 + exp(-eta))
  - sum(y * log(p) + (1-y) * log(1-p), na.rm = TRUE)
}
st <- c(Intercept = 0.1, age = 0.1, sector2 = 0.1)
MLE_estimates <- optim(st, nlldbin, hessian = TRUE)

MLE_estimates$par
##   Intercept         age     sector2 
## -2.15932867  0.02680381  1.18158898 

coef(oreduced)
## (Intercept)         age     sector2 
## -2.15965912  0.02681289  1.18169345 

Note that the last two lines in nlldbin could be written as follows if you are willing to make use of dbinom and plogis:

-sum(dbinom(y, 1, plogis(eta), log = TRUE))

Upvotes: 1

Related Questions