user182513
user182513

Reputation:

Calculating Binomial Coefficient (nCk) for large n & k

I just saw this question and have no idea how to solve it. can you please provide me with algorithms , C++ codes or ideas?

This is a very simple problem. Given the value of N and K, you need to tell us the value of the binomial coefficient C(N,K). You may rest assured that K <= N and the maximum value of N is 1,000,000,000,000,000. Since the value may be very large, you need to compute the result modulo 1009. Input

The first line of the input contains the number of test cases T, at most 1000. Each of the next T lines consists of two space separated integers N and K, where 0 <= K <= N and 1 <= N <= 1,000,000,000,000,000. Output

For each test case, print on a new line, the value of the binomial coefficient C(N,K) modulo 1009. Example

Input:
3
3 1
5 2
10 3

Output:
3
10
120

Upvotes: 20

Views: 31559

Answers (5)

AndyUpNorth
AndyUpNorth

Reputation: 1

The following code shows how to obtain all the binomial coefficients for a given size 'n'. You could easily modify it to stop at a given k in order to determine nCk. It is computationally very efficient, it's simple to code, and works for very large n and k.

binomial_coefficient = 1
output(binomial_coefficient)
col = 0
n = 5

do while col < n
    binomial_coefficient = binomial_coefficient * (n + 1 - (col + 1)) / (col + 1)
    output(binomial_coefficient)
    col = col + 1
loop

The output of binomial coefficients is therefore:

1
1 *  (5 + 1 - (0 + 1)) / (0 + 1) = 5 
5 *  (5 + 1 - (1 + 1)) / (1 + 1) = 15
15 * (5 + 1 - (2 + 1)) / (2 + 1) = 15 
15 * (5 + 1 - (3 + 1)) / (3 + 1) = 5 
5 *  (5 + 1 - (4 + 1)) / (4 + 1) = 1

I had found the formula once upon a time on Wikipedia but for some reason it's no longer there :(

Upvotes: -1

Aryabhatta
Aryabhatta

Reputation:

Notice that 1009 is a prime.

Now you can use Lucas' Theorem.

Which states:

Let p be a prime.
If n  = a1a2...ar when written in base p and
if k  = b1b2...br when written in base p

(pad with zeroes if required)

Then

(n choose k) modulo p = (a1 choose b1) * (a2 choose  b2) * ... * (ar choose br) modulo p.

i.e. remainder of n choose k when divided by p is same as the remainder of
the product (a1 choose b1) * .... * (ar choose br) when divided by p.
Note: if bi > ai then ai choose bi is 0.

Thus your problem is reduced to finding the product modulo 1009 of at most log N/log 1009 numbers (number of digits of N in base 1009) of the form a choose b where a <= 1009 and b <= 1009.

This should make it easier even when N is close to 10^15.

Note:

For N=10^15, N choose N/2 is more than 2^(100000000000000) which is way beyond an unsigned long long.

Also, the algorithm suggested by Lucas' theorem is O(log N) which is exponentially faster than trying to compute the binomial coefficient directly (even if you did a mod 1009 to take care of the overflow issue).

Here is some code for Binomial I had written long back, all you need to do is to modify it to do the operations modulo 1009 (there might be bugs and not necessarily recommended coding style):

class Binomial
{
public:
    Binomial(int Max)
    {
        max = Max+1;
        table = new unsigned int * [max]();
        for (int i=0; i < max; i++)
        {
            table[i] = new unsigned int[max]();

            for (int j = 0; j < max; j++)
            {
                table[i][j] = 0;
            }
        }
    }

    ~Binomial()
    {
        for (int i =0; i < max; i++)
        {
            delete table[i];
        }
        delete table;
    }

    unsigned int Choose(unsigned int n, unsigned int k);

private:
    bool Contains(unsigned int n, unsigned int k);

    int max;
    unsigned int **table;
};

unsigned int Binomial::Choose(unsigned int n, unsigned int k)
{
    if (n < k) return 0;
    if (k == 0 || n==1 ) return 1;
    if (n==2 && k==1) return 2;
    if (n==2 && k==2) return 1;
    if (n==k) return 1;


    if (Contains(n,k))
    {
        return table[n][k];
    }
    table[n][k] = Choose(n-1,k) + Choose(n-1,k-1);
    return table[n][k];
}

bool Binomial::Contains(unsigned int n, unsigned int k)
{
    if (table[n][k] == 0) 
    {
        return false;
    }
    return true;
}

Upvotes: 27

Steve Jessop
Steve Jessop

Reputation: 279195

Binomial coefficient is one factorial divided by two others, although the k! term on the bottom cancels in an obvious way.

Observe that if 1009, (including multiples of it), appears more times in the numerator than the denominator, then the answer mod 1009 is 0. It can't appear more times in the denominator than the numerator (since binomial coefficients are integers), hence the only cases where you have to do anything are when it appears the same number of times in both. Don't forget to count multiples of (1009)^2 as two, and so on.

After that, I think you're just mopping up small cases (meaning small numbers of values to multiply/divide), although I'm not sure without a few tests. On the plus side 1009 is prime, so arithmetic modulo 1009 takes place in a field, which means that after casting out multiples of 1009 from both top and bottom, you can do the rest of the multiplication and division mod 1009 in any order.

Where there are non-small cases left, they will still involve multiplying together long runs of consecutive integers. This can be simplified by knowing 1008! (mod 1009). It's -1 (1008 if you prefer), since 1 ... 1008 are the p-1 non-zero elements of the prime field over p. Therefore they consist of 1, -1, and then (p-3)/2 pairs of multiplicative inverses.

So for example consider the case of C((1009^3), 200).

Imagine that the number of 1009s are equal (don't know if they are, because I haven't coded a formula to find out), so that this is a case requiring work.

On the top we have 201 ... 1008, which we'll have to calculate or look up in a precomputed table, then 1009, then 1010 ... 2017, 2018, 2019 ... 3026, 3027, etc. The ... ranges are all -1, so we just need to know how many such ranges there are.

That leaves 1009, 2018, 3027, which once we've cancelled them with 1009's from the bottom will just be 1, 2, 3, ... 1008, 1010, ..., plus some multiples of 1009^2, which again we'll cancel and leave ourselves with consecutive integers to multiply.

We can do something very similar with the bottom to compute the product mod 1009 of "1 ... 1009^3 - 200 with all the powers of 1009 divided out". That leaves us with a division in a prime field. IIRC that's tricky in principle, but 1009 is a small enough number that we can manage 1000 of them (the upper limit on the number of test cases).

Of course with k=200, there's an enormous overlap which could be cancelled more directly. That's what I meant by small cases and non-small cases: I've treated it like a non-small case, when in fact we could get away with just "brute-forcing" this one, by calculating ((1009^3-199) * ... * 1009^3) / 200!

Upvotes: 8

dmuir
dmuir

Reputation: 449

I don't think you want to calculate C(n,k) and then reduce mod 1009. The biggest one, C(1e15,5e14) will require something like 1e16 bits ~ 1000 terabytes

Moreover executing the loop in snakiles answer 1e15 times seems like it might take a while. What you might use is, if

n = n0 + n1*p + n2*p^2 ... + nd*p^d

m = m0 + m1*p + m2*p^2 ... + md*p^d

(where 0<=mi,ni < p)

then C(n,m) = C(n0,m0) * C(n1,m1) *... * C(nd, nd) mod p

see, eg http://www.cecm.sfu.ca/organics/papers/granville/paper/binomial/html/binomial.html

One way would be to use pascal's triangle to build a table of all C(m,n) for 0<=m<=n<=1009.

Upvotes: 5

snakile
snakile

Reputation: 54521

psudo code for calculating nCk:

result = 1    
for i=1 to min{K,N-K}:
   result *= N-i+1
   result /= i
return result

Time Complexity: O(min{K,N-K})

The loop goes from i=1 to min{K,N-K} instead of from i=1 to K, and that's ok because

C(k,n) = C(k, n-k)

And you can calculate the thing even more efficiently if you use the GammaLn function.

nCk = exp(GammaLn(n+1)-GammaLn(k+1)-GammaLn(n-k+1))

The GammaLn function is the natural logarithm of the Gamma function. I know there's an efficient algorithm to calculate the GammaLn function but that algorithm isn't trivial at all.

Upvotes: 2

Related Questions