Reputation: 13
I have some data that looks like this:
x y
1: 3 1
2: 6 1
3: 1 0
4: 31 8
5: 1 0
---
(Edit: if it helps, here are sample vectors for x and y
x = c(3, 6, 1, 31, 1, 18, 73, 29, 2, 1)
y = c(1, 1, 0, 8, 0, 0, 8, 1, 0, 0)
The column on the left (x) is my sample size, and the column on the right (y) is the number successes that occur in each sample.
I would like to fit these data using a binomial distribution in order to find the probability of a success (p). All examples for fitting a binomial distribution that I've found so far assume a constant sample size (n) across all data points, but here I have varying sample sizes.
How do I fit data like these, with varying sample sizes, to a binomial distribution? The desired outcome is p, the probability of observing a success in a sample size of 1.
How do I accomplish a fit like this using R?
(Edit #2: Response below outlines solution and related R code if I assume that the events observed in each sample can be assumed to be independent, in addition to assuming that the samples themselves are also independent. This works for my data - thanks!)
Upvotes: 1
Views: 6140
Reputation: 61
library(bbmle)
x = c(3, 6, 1, 31, 1, 18, 73, 29, 2, 1)
y = c(1, 1, 0, 8, 0, 0, 8, 1, 0, 0)
mf = function(prob, x, size){
-sum(dbinom(x, size, prob, log=TRUE))
}
m1 = mle2(mf, start=list(prob=0.01), data=list(x=y, size=x))
print(m1)
Coefficients: prob 0.1151535
Log-likelihood: -13.47
Upvotes: 2
Reputation: 5281
What about calculating the empirical probability of success
x <- c(3, 6, 1, 31, 1, 18, 73, 29, 2, 1)
y <- c(1, 1, 0, 8, 0, 0, 8, 1, 0, 0)
avr.sample <- mean(x)
avr.success <- mean(y)
p <- avr.success/avr.sample
[1] 0.1151515
Or using binom.test
z <- x-y # number of fails
binom.test(x = c(sum(y), sum(z)))
Exact binomial test
data: c(sum(y), sum(z))
number of successes = 19, number of trials = 165, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.07077061 0.17397215
sample estimates:
probability of success
0.1151515
However, this assumes that:
This means in every iteration k
of the experiment (i.e. row of x
) we execute an action such as throwing x[k]
identical dices (not necessarily fair dices) and success would mean to get a given (predetermined) number n
in 1:6
.
If we supposed that that above results were achieved when trying to get a 1
when throwing x[k]
dices in every iteration k
, then one could say that the empirical probability of getting a 1
is (~) 0.1151515
.
In the end, the distribution in question would be B(sum(x), p)
.
PS: In the above illustration, the dices are identical to each other not only in any given iteration but across all iterations.
Upvotes: 2