Finding coverage using Bivariate Ellipse

Question

I am struggling with a problem on estimation. In several instances I have shown how to calculate a Bivariate Ellipse using a vector of points generated by Bivariate Normal distribution. The code works fine except that the coverage (the number of times the generated or true ps(p1,p2) are contained in the estimated ellipse) I am getting seems extremely low. I should also state that the old version of R gave significantly different results compared to the new version. I am now using R 3.0.1. Here is the code to be able to reproduce the problem.

  library(MASS)
  set.seed(1234)
  x1<-NULL
  x2<-NULL
  k<-1
  Sigma2 <- matrix(c(.72,.57,.57,.46),2,2)
  Sigma2
  rho <- Sigma2[1,2]/sqrt(Sigma2[1,1]*Sigma2[2,2])
  eta<-replicate(300,mvrnorm(k, mu=c(-1.01,-2.39), Sigma2)) 
  p1<-exp(eta)/(1+exp(eta)) # true p's
  n<-60
  x1<-replicate(300,rbinom(k,n,p1[1,])) 
  x2<-replicate(300,rbinom(k,n,p1[2,]))

  rate1<-x1/60  # Estimated p's
  rate2<-x2/60
  library(car)
  ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95))
  library(sp)
  within<-point.in.polygon(p1[1,], p1[2,], ell$`0.95`[,1], ell$`0.95`[,2])
  mean(within)    # coverage

Nate Pope · Accepted Answer

The mistake lies in the lines:

x1<-replicate(300,rbinom(k,n,p1[1,])) 
x2<-replicate(300,rbinom(k,n,p1[2,]))

Because k=1, the call rbinom(k,n,p1[1,]) generates a single random deviate, and only the first probability in p1[1,] is used. You are replicating this call 300 times, and so are using the same probability for each deviate. Therefore, rate1 and rate2 occupy a far smaller parameter space than does p1. Visualize this by plotting p1 over your data ellipse:

x1<-replicate(300,rbinom(k,n,p1[1,])) 
x2<-replicate(300,rbinom(k,n,p1[2,]))

rate1<-x1/60  # Estimated p's
rate2<-x2/60
library(car)
plot.new()
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA)
library(sp)
within<-point.in.polygon(p1[1,], p1[2,], ell$`0.95`[,1], ell$`0.95`[,2])
mean(within)  

plot(p1[1,which(within==1)], p1[2,which(within==1)], col="blue", ylim=c(0,1),xlim=c(0,1))
points(p1[1,which(within==0)], p1[2,which(within==0)], col="green")

ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA, add=T)

The correct code gives the appropriate coverage (around 95%):

x1<-rbinom(300,n,p1[1,])
x2<-rbinom(300,n,p1[2,])
rate1<-x1/60  # Estimated p's
rate2<-x2/60
library(car)
plot.new()
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA)
library(sp)
within<-point.in.polygon(p1[1,], p1[2,], ell$`0.95`[,1], ell$`0.95`[,2])
mean(within)  

plot(p1[1,which(within==1)], p1[2,which(within==1)], col="blue", ylim=c(0,1),xlim=c(0,1))
points(p1[1,which(within==0)], p1[2,which(within==0)], col="green")
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA, add=T)

Finding coverage using Bivariate Ellipse

Answers (1)

Related Questions