Reputation: 7317
I'm trying to get a posterior distribution using MCMCpack
of a difference between two conversion rates, akin to the A and B Together section of this PyMC tutorial.
I can get the posteriors of the two sampled rates just fine, but I'm struggling how to implement the sampled delta.. Any ideas?
Edit The true delta (which would have been unknown if we hadn't fabricated the data and is what we want to estimate using MCMC) is the difference between the two rates true_p_a
and true_p_b
i.e. 0.01.
# define true success rates
true_p_a = 0.05
true_p_b = 0.04
# set sample sizes
n_samples_a = 1000
n_samples_b = 1000
# fabricate some data
set.seed(10);
obs_a = rbinom(n=n_samples_a, size=1, prob=true_p_a)
set.seed(1);
obs_b = rbinom(n=n_samples_b, size=1, prob=true_p_b)
# what are the observed conversion rates?
mean(obs_a) #0.056
mean(obs_b) #0.042
# convert to number of successes
successes_a = sum(obs_a) #56
successes_b = sum(obs_b) #42
# calculate the posterior
require(MCMCpack)
simulations = 20000
posterior_a = MCbinomialbeta(successes_a ,n_samples_a, alpha=1, beta=1,mc=simulations)
posterior_b = MCbinomialbeta(successes_b ,n_samples_b, alpha=1, beta=1,mc=simulations)
posterior_delta = ????
posterior_density_a = density(posterior_a)
posterior_density_b = density(posterior_b)
# plot the posteriors
require(ggplot2)
ggplot() +
geom_area(aes(posterior_density_a$x, posterior_density_a$y), fill="#7ad2f6", alpha=.5) +
geom_vline(aes(xintercept=.05), color="#7ad2f6", linetype="dotted", size=2) +
geom_area(aes(posterior_density_b$x, posterior_density_b$y), fill="#014d64", alpha=.5) +
geom_vline(aes(xintercept=.04), color="#014d64", linetype="dotted", size=2) +
scale_x_continuous(labels=percent_format(), breaks=seq(0,0.1, 0.01))
Upvotes: 3
Views: 608
Reputation: 126
You are only struggling because you haven't fully adopted a Bayesian mindset. It's totally OK, I had a lot of the same conceptual issues when I started out. (This question is way old, so you've probably figured it out already).
A Bayesian posterior density incorporates all the available information about the distribution of the parameters of the model. Thus to calculate a function of any of the parameters of the model, you simply calculate that function for each draw from the posterior distribution. You don't need to worry about standard errors and asymptotic inference because you already have all the information you need.
In this case, because the difference between the parameters is a constant, and you have plenty of data, there is little uncertainty about delta. It is estimated at a mean of 0.014 with a SD (not SE) of .009.
Your code with finished analysis:
# define true success rates
true_p_a = 0.05
true_p_b = 0.04
# set sample sizes
n_samples_a = 1000
n_samples_b = 1000
# fabricate some data
set.seed(10);
obs_a = rbinom(n=n_samples_a, size=1, prob=true_p_a)
set.seed(1);
obs_b = rbinom(n=n_samples_b, size=1, prob=true_p_b)
# what are the observed conversion rates?
mean(obs_a) #0.056
mean(obs_b) #0.042
# convert to number of successes
successes_a = sum(obs_a) #56
successes_b = sum(obs_b) #42
# calculate the posterior
require(MCMCpack)
simulations = 20000
posterior_a = MCbinomialbeta(successes_a ,n_samples_a, alpha=1, beta=1,mc=simulations)
posterior_b = MCbinomialbeta(successes_b ,n_samples_b, alpha=1, beta=1,mc=simulations)
# Subtract the posterior deltas, look at empirical summaries and plot the empirical density function
posterior_delta = posterior_a - posterior_b
summary(posterior_delta)
require(ggplot2)
ggplot(data.frame(delta=as.numeric(posterior_delta)),aes(x=delta)) + geom_density() + theme_minimal()
Upvotes: 3