jenswirf
jenswirf

Reputation: 7317

How to get a posterior of a difference using MCMCpack?

I'm trying to get a posterior distribution using MCMCpack of a difference between two conversion rates, akin to the A and B Together section of this PyMC tutorial.

I can get the posteriors of the two sampled rates just fine, but I'm struggling how to implement the sampled delta.. Any ideas?

Edit The true delta (which would have been unknown if we hadn't fabricated the data and is what we want to estimate using MCMC) is the difference between the two rates true_p_a and true_p_b i.e. 0.01.

enter image description here

# define true success rates
true_p_a = 0.05
true_p_b = 0.04

# set sample sizes 
n_samples_a = 1000
n_samples_b = 1000

# fabricate some data
set.seed(10); 
obs_a = rbinom(n=n_samples_a, size=1, prob=true_p_a)
set.seed(1);  
obs_b = rbinom(n=n_samples_b, size=1, prob=true_p_b)

# what are the observed conversion rates?
mean(obs_a) #0.056
mean(obs_b) #0.042

# convert to number of successes
successes_a = sum(obs_a) #56
successes_b = sum(obs_b) #42

# calculate the posterior
require(MCMCpack)

simulations = 20000

posterior_a = MCbinomialbeta(successes_a ,n_samples_a, alpha=1, beta=1,mc=simulations)
posterior_b = MCbinomialbeta(successes_b ,n_samples_b, alpha=1, beta=1,mc=simulations)

posterior_delta = ????                                         

posterior_density_a = density(posterior_a)
posterior_density_b = density(posterior_b)


# plot the posteriors
require(ggplot2)
ggplot() + 
  geom_area(aes(posterior_density_a$x, posterior_density_a$y), fill="#7ad2f6", alpha=.5) +  
  geom_vline(aes(xintercept=.05), color="#7ad2f6", linetype="dotted", size=2) +
  geom_area(aes(posterior_density_b$x, posterior_density_b$y), fill="#014d64", alpha=.5) +  
  geom_vline(aes(xintercept=.04), color="#014d64", linetype="dotted", size=2) +
  scale_x_continuous(labels=percent_format(), breaks=seq(0,0.1, 0.01))

Upvotes: 3

Views: 608

Answers (1)

Robert Kubinec
Robert Kubinec

Reputation: 126

You are only struggling because you haven't fully adopted a Bayesian mindset. It's totally OK, I had a lot of the same conceptual issues when I started out. (This question is way old, so you've probably figured it out already).

A Bayesian posterior density incorporates all the available information about the distribution of the parameters of the model. Thus to calculate a function of any of the parameters of the model, you simply calculate that function for each draw from the posterior distribution. You don't need to worry about standard errors and asymptotic inference because you already have all the information you need.

In this case, because the difference between the parameters is a constant, and you have plenty of data, there is little uncertainty about delta. It is estimated at a mean of 0.014 with a SD (not SE) of .009.

Your code with finished analysis:

  # define true success rates
  true_p_a = 0.05
  true_p_b = 0.04

  # set sample sizes 
  n_samples_a = 1000
  n_samples_b = 1000

  # fabricate some data
  set.seed(10); 
  obs_a = rbinom(n=n_samples_a, size=1, prob=true_p_a)
  set.seed(1);  
  obs_b = rbinom(n=n_samples_b, size=1, prob=true_p_b)

  # what are the observed conversion rates?
  mean(obs_a) #0.056
  mean(obs_b) #0.042

  # convert to number of successes
  successes_a = sum(obs_a) #56
  successes_b = sum(obs_b) #42

  # calculate the posterior
  require(MCMCpack)

  simulations = 20000

  posterior_a = MCbinomialbeta(successes_a ,n_samples_a, alpha=1, beta=1,mc=simulations)
  posterior_b = MCbinomialbeta(successes_b ,n_samples_b, alpha=1, beta=1,mc=simulations)

  # Subtract the posterior deltas, look at empirical summaries and plot the empirical density function

  posterior_delta = posterior_a - posterior_b

  summary(posterior_delta)

  require(ggplot2)

  ggplot(data.frame(delta=as.numeric(posterior_delta)),aes(x=delta)) + geom_density() + theme_minimal()

Upvotes: 3

Related Questions