Funkytown
Funkytown

Reputation: 560

R - How to Speed Up Recursion and Double Summation

Since this is essentially a question about how to efficiently perform a computation in R, I will start with the equation and then provide an explanation for the problem after the code for those who would find it useful or interesting.

I have written a script in R to generate values using the following function:

Success probability when dice explode

The function, as you can see, is recursive and involves double summation. It works well for small numbers around 15 or lower, but the execution time gets prohibitively long at higher values of n and t. I need to be able to perform the calculation for every n and t pair from 1 to 30. Is there a way to write a script that won't take months to execute?

My current script is:

explProb <- function(n,t) {
    prob <- 0

    #################################
    # FIRST PART - SINGLE SUMMATION
    #################################
    i <- 0
    if(t<=n) {
        i <- c(t:n)
    }
    prob = sum(choose(n,i[i>0])*((1/3)^(i[i>0]))*((2/3)^(n-i[i>0])))

    #################################
    # SECOND PART - DOUBLE SUMMATION
    #################################
    if(t >= 2) {
        for(k in 1:(t-1)) {
            j <- c(0:(k-1))
            prob = prob + sum(choose(n,n-k)*((1/6)^(j))*((1/6)^(k-j))*((2/3)^(n-k))*explProb(k-j,t-k))
        }
    }

    return(prob)
}
MAX_DICE = 30
MAX_THRESHOLD = 30
probabilities = matrix(0,MAX_DICE,MAX_THRESHOLD)

for(dice in 1:MAX_DICE) {
    for(threshold in 1:MAX_THRESHOLD) {
        #print(sprintf("DICE = %d : THRESH = %d", dice, threshold))
        probabilities[dice,threshold] = explProb(dice,threshold)
    }
}

I am trying to write a script to generate a set of probabilities for a particular type of dice roll in a tabletop roleplaying game (Shadowrun 5th Edition, to be specific). The type of dice roll is called an "Exploding Dice Roll". In case you are not familiar with how these rolls work in this game, let me briefly explain.

Whenever you try to accomplish a task you make a test by rolling a number of six-sided dice. Your goal is to get a predetermined number "hits" when rolling those dice. A "hit" is defined as a 5 or 6 on a six-sided die. So, for example, if you have a dice pool of 5 dice, and you roll: 1, 3, 3, 5, 6 then you have gotten 2 hits.

In some cases you are allowed to re-roll all of the 6's that were rolled in order to try and get MORE hits.This is called an "exploding" roll. The 6's counts as hits, but can be re-rolled to "explode" into even more hits. For clarification I'll give a quick example...

If you roll 10 dice and get a result of 1, 2, 2, 4, 5, 5, 6, 6, 6, 6 then you have gotten 6 hits on the first roll... However, the 4 dice that rolled 6's can be re-rolled again. If you roll those dice and get 3, 5, 6, 6 then you have 3 more hits for a total of 9 hits. But you can now re-roll the two more sixes you got... etc... You keep re-rolling the sixes, adding the 5's and 6's to your total hits, and keep going until you get a roll with no sixes.

The function listed above generates these probabilities taking an input of "# of dice" and "number of hits" (called a "threshold" here).

n = # of Dice being rolled t = Threshold number of "hits" to be reached

Upvotes: 10

Views: 856

Answers (2)

A. Webb
A. Webb

Reputation: 26446

Efficient Calculation Using Probability Distributions

The approach in the question and in my other answer use sums over transitions of dependent binomial distributions. The dependency arising from the carry over of previous successes (hits) to subsequent trials (rolls) complicates the calculations.

An alternative approach is to view each die separately. Roll a single die as long as it turns up as a hit. Each die is independent of the other, so the random variables may be summed efficiently through convolution. However, the distribution for each die is a geometric distribution, and the sum of independent geometric distributions gives rise to a negative binomial distribution.

R provides the negative binomial distribution, so the results obtained in my other answer may be had all at once by

round(dnbinom(0:19,10,prob=2/3),4)
 [1] 0.0173 0.0578 0.1060 0.1413 0.1531 0.1429 0.1191 0.0907 0.0643 0.0428
[11] 0.0271 0.0164 0.0096 0.0054 0.0030 0.0016 0.0008 0.0004 0.0002 0.0001

The probability matrix in the question, with MAX_DICE=MAX_THRESHOLD=10, has first column equal to

1-dnbinom(0,1:10,prob=2/3)

So, you might be looking for the cumulative distribution function. I have not been able to figure out your intentions with the subsequent columns, but perhaps the goal was

outer(1:10,0:10,function(size,x) 1-dnbinom(x,size,prob=2/3))

Upvotes: 3

A. Webb
A. Webb

Reputation: 26446

Calculation with Transition Matrix

If we have n=10 dice, then the probability of 0 to 10 occurrences of an event with prob=2/6 may be efficiently calculated in R as

dbinom(0:10,10,2/6)

Since you are allowed to keep rolling until failure, any number of ultimate hits is possible (the support of the distribution is [0,Inf)), albeit with geometrically diminishing probabilities. A recursive numeric solution is feasible due to the need to establish a cutoff for machine precision and the presence of a threshold to censor.

Since rerolls are with a smaller number of dice, it makes sense to precalculate all transition probabilities.

X<-outer(0:10,0:10,function(x,size) dbinom(x,size,2/6))

Where the i-th row of the j-th column gives the probability of (i-1) successes (hits) with (j-1) trials (dice rolled). For example, the probability of exactly 1 success with 6 trials is located at X[2,7].

Now if you start out with 10 dice, we can represent this as the vector

d<-c(rep(0,10),1) 

Showing that with probability 1 we have 10 dice with 0 probability everywhere else.

After a single roll, the probabilities of the number of live dice is X %*% d. After two rolls, the probabilities are X %*% X %*% d. We can calculate the live dice state probabilities after any number of rolls by iterating.

T<-Reduce(function(dn,n) X %*% dn,1:11,d,accumulate=TRUE)

Where T[1] gives the probabilities of live dice before the first roll and T[11] gives the probabilities of live dice before the 11th (after the 10th).

This is sufficient to calculate expected values, but for the distribution of cumulative sums, we'll need to track additional information in the state. The following function reshapes a state matrix at each step so that the i-th row and j-th column has the probability of (i-1) live dice with a current cumulative total of j-1.

step<-function(m) {
  idx<-arrayInd(seq_along(m),dim(m))
  idx[,2]<-rowSums(idx)-1
  i<-idx[nrow(idx),]
  m2<-matrix(0,i[1],i[2])
  m2[idx]<-m
  return(m2)
}

In order to recover the probabilities for cumulative totals, we use the following convenience function to sum across anti-diagonals

conv<-function(m) 
  tapply(c(m),c(row(m)+col(m)-2),FUN=sum)

The probabilities of continuing to roll rapidly diminish, so I've cut off at 40, and shown up to 20, rounded to 4 places

round(conv(Reduce(function(mn,n) X %*% step(mn), 1:40, X %*% d))[1:21],4)

#>      0      1      2      3      4      5      6      7      8      9 
#> 0.0173 0.0578 0.1060 0.1413 0.1531 0.1429 0.1191 0.0907 0.0643 0.0428 
#> 
#>     10     11     12     13     14     15     16     17     18     19   
#> 0.0271 0.0164 0.0096 0.0054 0.0030 0.0016 0.0008 0.0004 0.0002 0.0001  

Calculation with Simulation

This can also be calculated in reasonable time with reasonable precision using simple simulation.

We simulate a roll of n 6-sided dice with sample(1:6,n,replace=TRUE), calculate the number to re-roll, and iterate until none are available, counting "hits" along the way.

sim<-function(n) {
  k<-0
  while(n>0) {
    roll<-sample(1:6,n,replace=TRUE)
    n<-sum(roll>=5)
    k<-k+n
   }
   return(k)
}

Now we can simply replicate a large number of trials and tabulate

prop.table(table(replicate(100000,sim(10))))

#>     0      1      2      3      4      5      6      7      8      9  
#> 0.0170 0.0588 0.1053 0.1431 0.1518 0.1433 0.1187 0.0909 0.0657 0.0421 
#>
#>     10     11     12     13     14     15     16     17     18     19
#> 0.0252 0.0161 0.0102 0.0056 0.0030 0.0015 0.0008 0.0004 0.0002 0.0001

This quite feasible even with 30 dice (a few seconds even with 100,000 replications).

Upvotes: 5

Related Questions