Farzin Shamloo
Farzin Shamloo

Reputation: 21

Distribution of the difference of two beta random variables: Issue implementing it

I have two proportions (p1, p2) and I want to characterize p1-p2. Assuming Beta prior for each, the posterior distributions of each will also be Beta. My original question was about the distribution of the difference of two Beta distributions. I found one solution (brute force) here and another solution (closed form) here. I implemented both and they matched for a few examples I did. The brute force code:

## prior for both groups
a_pri <- 1
b_pri <- 1
## observed success and N for each group
g1_succ <- 8
g1_N <- 10
g2_succ <- 6
g2_N <- 10
## posterior parameters
a_g1 <- a_pri + g1_succ
b_g1 <- b_pri + g1_N - g1_succ

a_g2 <- a_pri + g2_succ
b_g2 <- b_pri + g2_N - g2_succ
## sampling from each group's posterior
samp_g1 <- rbeta(n = 10000,shape1 = a_g1,shape2 = b_g1)
samp_g2 <- rbeta(n = 10000,shape1 = a_g2,shape2 = b_g2)
## brute force the difference and visualization:
samp_diff <- samp_g1 - samp_g2

df_BF <- data.frame(samp_g1,samp_g2,s_diff= samp_diff) 
ggplot(df_BF)+geom_density(aes(x=s_diff))+xlim(-1,1)

This works well, and this is my implementation of the Pham-Gia and Turkkan (1993) paper:

AA1 <- a_g1;BB1 <- b_g1;AA2 <- a_g2;BB2 <- b_g2
## writing the function, only function of PP (the difference)
func_PD <- function(PP){  
  A <- beta(AA1, BB1) * beta(AA2, BB2)
  if(PP > 0 & PP <= 1){
    f_p <- beta(AA2,BB1)*(PP^(BB1+BB2-1))*((1-PP)^(AA2+BB1-1))*
      tolerance::F1(a = BB1, b = AA1+BB1+AA2+BB2-2,
                    b.prime = 1-AA1, c = BB1+AA2, x = 1-PP, y = 1-PP^2)/A
  }
  if(PP >= -1 & PP < 0){
    f_p <- beta(AA1,BB2)*((-PP)^(BB1+BB2-1))*((1+PP)^(AA1+BB2-1))*
      tolerance::F1(a = BB2, b = 1-AA2,
                    b.prime = AA1+BB1+AA2+BB2-2, c = AA1+BB2, x = 1-PP^2, y = 1+PP)/A
  }
  if(PP==0){
    if(AA1+AA2>1 & BB1+BB2>0){
      f_p <- beta(AA1+AA2-1,BB1+BB2-1)/A
    }
    if(!(AA1+AA2>1 & BB1+BB2>0)){
      f_p <- NA
    }
  }
  return(f_p)
}
## vectorizing it:
func_PD_vec <- Vectorize(func_PD)
## setting x values varying from -1 to 1:
x <- seq(-0.99,0.99,0.01)
## getting the pdf of difference:
y <- func_PD_vec(x)
## getting cdf by integrating:
df <- data.frame(p=x,pdf=y)
df$cdf <- NA
A_ch <- 0
for (i in 1:nrow(df)){
  if(A_ch > 0.995) {break}
  KK <- integrate(func_PD_vec, lower=-1, upper=df$p[i])
  df$cdf[i] <- KK$value
  A_ch <- df$cdf[i]
  # df$cdf[i] <- trapzfun(func_PD, a=-1,b= df$p[i])$value
}
df$cdf[which(is.na(df$cdf))] <- 1
## visualizing:
ggplot(df,aes(p,pdf))+geom_point()

With the data I set up there (8/10 vs 6/10), everything works well, and the two methods match, as they should. But when I change it to:

g1_succ <- 80
g1_N <- 100
g2_succ <- 60
g2_N <- 100

and I repeat everything, I get an error at this step:

## getting the pdf of difference:
y <- func_PD_vec(x)

I repeated this step using a for loop to find the first x for which this error occured, and found the first error occurs with x=-0.02, and this is error I get:

"Error in integrate(A1.simple, 0, 1, a = a, b = b, b.prime = b.prime, c = c, : non-finite function value"

Is there a way to fix it? I assumed it's caused by F1 function that calculates Appell's F1 Hypergeometric Function, but not sure how to fix it.

I tried looking for other (maybe more stable, or efficient) implementations but couldn't find one.

Thanks!

Upvotes: 0

Views: 175

Answers (1)

ncb_anon
ncb_anon

Reputation: 1

I've also had similar problems with the source you have been using.

Instead I have found the functions included in the 'phase1b' package to be more stable at https://github.com/Genentech/phase1b

The functions dbetadiff and pbetadiff are what you need.

 #devtools::install_github("https://github.com/Genentech/phase1b/", force = TRUE)
library(phase1b)
library(ggplot2)
## prior for both groups
a_pri <- 1
b_pri <- 1
## observed success and N for each group
g1_succ <- 80
g1_N <- 100
g2_succ <- 60
g2_N <- 100
## posterior parameters
a_g1 <- a_pri + g1_succ
b_g1 <- b_pri + g1_N - g1_succ

a_g2 <- a_pri + g2_succ
b_g2 <- b_pri + g2_N - g2_succ

x <- seq(-0.99,0.99,0.01)
df <- data.frame(p=x,pdf=dbetadiff(x,parX=c(a_g2,b_g2),parY =c(a_g1,b_g1) ))
ggplot(df,aes(p,pdf))+geom_line()

for (i in 1:nrow(df)){
   df$cdf[i] <- pbetadiff(x[i],parX=c(a_g2,b_g2),parY =c(a_g1,b_g1) )
 }
ggplot(df,aes(p,cdf))+geom_line()

I've also found functions ddiffprop and pdiffprop in the 'tolerance' package which work well.

Upvotes: 0

Related Questions