chaya
chaya

Reputation: 11

What's the best R function for calculating probability?

The question I'm trying to answer is as follows: "A box contains 15 red marbles and 5 blue marbles. 10 marbles are drawn without replacement. What is the chance at least 2 of them will be red?" What functions in R would help me solve this? I don't need an answer to the question itself, rather a function in R that will help me solve it!

Upvotes: 0

Views: 315

Answers (2)

Chuck P
Chuck P

Reputation: 3923

You don't have to simuate you can calculate exactly with a few r functions.

library(DescTools)

# build a vector with the right marbles
marbles <- c(rep("Red", 15), rep("Blue", 5))

# Our denominator is the total possible ways we
# can draw ten marbles no replacement orde doesn't matter
# CombN from DescTools gives us that number
CombN(marbles, 10, repl = FALSE, ord = FALSE)
#> [1] 184756

# CombSet generates all the permutations
tail(CombSet(marbles, 10, repl = FALSE, ord = FALSE))
#>           [,1]  [,2]  [,3]  [,4]  [,5]  [,6]   [,7]   [,8]   [,9]   [,10] 
#> [184751,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184752,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184753,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184754,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184755,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184756,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"

# We'll put all those combinations in a matrix called x
x <- CombSet(marbles, 10)

# We can search for all rows that have "Red" at least twice
# with x[rowSums(x == "Red") >= 2, ]
tail(x[rowSums(x == "Red") >= 2, ])
#>           [,1]  [,2]  [,3]  [,4]  [,5]  [,6]   [,7]   [,8]   [,9]   [,10] 
#> [184751,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184752,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184753,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184754,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184755,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"
#> [184756,] "Red" "Red" "Red" "Red" "Red" "Blue" "Blue" "Blue" "Blue" "Blue"

# So the total possibilities are
nrow(x)
#> [1] 184756

# And the number that meet the condition is
nrow(x[rowSums(x == "Red") >= 2, ])
#> [1] 184756

# to calulate the probability
# we put nrow(x[rowSums(x == "Red") >= 2, ]) over
# nrow(x)
nrow(x[rowSums(x == "Red") >= 2, ]) / nrow(x)
#> [1] 1

# The probability is 1

### a more reasonable game
# same marbles
marbles <- c(rep("Red", 15), rep("Blue", 5))

# Draw 5 without replacement what is the probability
# we'll draw exactly two blue

totalpossiblepicks <- CombN(marbles, 5, repl = FALSE, ord = FALSE)

matrixofpicks <- CombSet(marbles, 5)

desiredoutcomes <- nrow(matrixofpicks[rowSums(matrixofpicks == "Blue") == 2, ])
# calculate probability
desiredoutcomes / totalpossiblepicks
#> [1] 0.2934727

Upvotes: 1

www
www

Reputation: 39154

I have a code to run simulation to calculate the probability. However, as I mentioned in the comment. The example you gave will always lead to 1, which is not exciting.

# Generate a vector containing 15 red marbles and 5 blue marbles
box <- c(rep("red", 15), rep("blue", 5))

# set seed for reproducibility
set.seed(124)

# Size of sampling
N <- 10

# A function to run simulation, taking N marbels without replacement
sim_fun <- function(x) sample(box, size = N)

# Number of simulation
S <- 10000

# Run the simulation S times
sims <- replicate(S, expr = sim_fun(), simplify = FALSE)

# Calculate the proportion that at least two marbles are red
mean(sapply(sims, function(x){
  sum(x == "red") >= 2
}))
# [1] 1

Upvotes: 1

Related Questions