Reputation: 99
I want to find estimate parameter with optim() package in R. And I compare my result with GLM model in R. The code is
d <- read.delim("http://dnett.github.io/S510/Disease.txt")
d$disease=factor(d$disease)
d$ses=factor(d$ses)
d$sector=factor(d$sector)
str(d)
oreduced <- glm(disease~age+sector, family=binomial(link=logit), data=d)
summary(oreduced)
y<-as.numeric(as.character(d$disease))
x1<-as.numeric(as.character(d$age))
x2<-as.numeric(as.character(d$sector))
nlldbin=function(param){
eta<-param[1]+param[2]*x1+param[3]*x2
p<-1/(1+exp(-eta))
-sum(y*log(p)+(1-y)*log(1-p),na.rm=TRUE)
}
MLE_estimates<-optim(c(Intercept=0.1,age=0.1,sector2=0.1),nlldbin,hessian=TRUE)
MLE_estimatesenter
The result with GLM model
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.15966 0.34388 -6.280 3.38e-10 ***
age 0.02681 0.00865 3.100 0.001936 **
sector2 1.18169 0.33696 3.507 0.000453 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
And with optim()
$par
Intercept age sector2
-3.34005918 0.02680405 1.18101449
Can someone please tell me why its different and how to fix this? Thank you
Upvotes: 1
Views: 182
Reputation: 1610
You've given R two different problems. In your GLM, all of the parameters in the formula are factor variables. This mean that you've told R that they can only take particular values (e.g. d$disease
can only take values 0 and 1). In your MLE approach, you've converted them to numeric variables, meaning that they can take any value and that your data just happens to use a small set of values.
The "fix" is to only give R one problem to solve. For example, if you instead fit glm(y~x1+x2, family=binomial(link=logit))
, which uses no factor variables, you get pretty much the same parameter estimates with both the MLE as with the fitted model. You've seen this before.
Upvotes: 1