Reputation: 327
I am struggling with a problem on estimation. In several instances I have shown how to calculate a Bivariate Ellipse using a vector of points generated by Bivariate Normal distribution. The code works fine except that the coverage (the number of times the generated or true ps(p1,p2) are contained in the estimated ellipse) I am getting seems extremely low. I should also state that the old version of R gave significantly different results compared to the new version. I am now using R 3.0.1. Here is the code to be able to reproduce the problem.
library(MASS)
set.seed(1234)
x1<-NULL
x2<-NULL
k<-1
Sigma2 <- matrix(c(.72,.57,.57,.46),2,2)
Sigma2
rho <- Sigma2[1,2]/sqrt(Sigma2[1,1]*Sigma2[2,2])
eta<-replicate(300,mvrnorm(k, mu=c(-1.01,-2.39), Sigma2))
p1<-exp(eta)/(1+exp(eta)) # true p's
n<-60
x1<-replicate(300,rbinom(k,n,p1[1,]))
x2<-replicate(300,rbinom(k,n,p1[2,]))
rate1<-x1/60 # Estimated p's
rate2<-x2/60
library(car)
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95))
library(sp)
within<-point.in.polygon(p1[1,], p1[2,], ell$`0.95`[,1], ell$`0.95`[,2])
mean(within) # coverage
Upvotes: 0
Views: 179
Reputation: 1706
The mistake lies in the lines:
x1<-replicate(300,rbinom(k,n,p1[1,]))
x2<-replicate(300,rbinom(k,n,p1[2,]))
Because k=1
, the call rbinom(k,n,p1[1,])
generates a single random deviate, and only the first probability in p1[1,]
is used. You are replicating this call 300 times, and so are using the same probability for each deviate. Therefore, rate1
and rate2
occupy a far smaller parameter space than does p1
. Visualize this by plotting p1
over your data ellipse:
x1<-replicate(300,rbinom(k,n,p1[1,]))
x2<-replicate(300,rbinom(k,n,p1[2,]))
rate1<-x1/60 # Estimated p's
rate2<-x2/60
library(car)
plot.new()
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA)
library(sp)
within<-point.in.polygon(p1[1,], p1[2,], ell$`0.95`[,1], ell$`0.95`[,2])
mean(within)
plot(p1[1,which(within==1)], p1[2,which(within==1)], col="blue", ylim=c(0,1),xlim=c(0,1))
points(p1[1,which(within==0)], p1[2,which(within==0)], col="green")
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA, add=T)
The correct code gives the appropriate coverage (around 95%):
x1<-rbinom(300,n,p1[1,])
x2<-rbinom(300,n,p1[2,])
rate1<-x1/60 # Estimated p's
rate2<-x2/60
library(car)
plot.new()
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA)
library(sp)
within<-point.in.polygon(p1[1,], p1[2,], ell$`0.95`[,1], ell$`0.95`[,2])
mean(within)
plot(p1[1,which(within==1)], p1[2,which(within==1)], col="blue", ylim=c(0,1),xlim=c(0,1))
points(p1[1,which(within==0)], p1[2,which(within==0)], col="green")
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA, add=T)
Upvotes: 3