Fitting Binomial Distribution in R using data with varying sample sizes

Question

I have some data that looks like this:

(Edit: if it helps, here are sample vectors for x and y

    x = c(3, 6, 1, 31, 1, 18, 73, 29, 2, 1)

    y = c(1, 1, 0, 8, 0, 0, 8, 1, 0, 0)

The column on the left (x) is my sample size, and the column on the right (y) is the number successes that occur in each sample.

I would like to fit these data using a binomial distribution in order to find the probability of a success (p). All examples for fitting a binomial distribution that I've found so far assume a constant sample size (n) across all data points, but here I have varying sample sizes.

How do I fit data like these, with varying sample sizes, to a binomial distribution? The desired outcome is p, the probability of observing a success in a sample size of 1.
How do I accomplish a fit like this using R?

(Edit #2: Response below outlines solution and related R code if I assume that the events observed in each sample can be assumed to be independent, in addition to assuming that the samples themselves are also independent. This works for my data - thanks!)

niko · Accepted Answer

What about calculating the empirical probability of success

x <- c(3, 6, 1, 31, 1, 18, 73, 29, 2, 1)
y <-  c(1, 1, 0, 8, 0, 0, 8, 1, 0, 0)

avr.sample <- mean(x)
avr.success <- mean(y)

p <- avr.success/avr.sample
[1] 0.1151515

Or using binom.test

z <- x-y # number of fails
binom.test(x = c(sum(y), sum(z)))
Exact binomial test

data:  c(sum(y), sum(z))
number of successes = 19, number of trials = 165, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.07077061 0.17397215
sample estimates:
probability of success 
             0.1151515

However, this assumes that:

The events corresponding to the rows are independent from each other
The events in the same row are independent from each other as well

This means in every iteration k of the experiment (i.e. row of x) we execute an action such as throwing x[k] identical dices (not necessarily fair dices) and success would mean to get a given (predetermined) number n in 1:6.

If we supposed that that above results were achieved when trying to get a 1 when throwing x[k] dices in every iteration k, then one could say that the empirical probability of getting a 1 is (~) 0.1151515.

In the end, the distribution in question would be B(sum(x), p).

PS: In the above illustration, the dices are identical to each other not only in any given iteration but across all iterations.

Fitting Binomial Distribution in R using data with varying sample sizes

Answers (2)

Related Questions