Reputation: 9798

Calculating the Amount of Combinations

Cheers,

I know you can get the amount of combinations with the following formula (without repetition and order is not important):

// Choose r from n

n! / r!(n - r)!

However, I don't know how to implement this in C++, since for instance with

n = 52

n! = 8,0658175170943878571660636856404e+67

the number gets way too big even for unsigned __int64 (or unsigned long long). Is there some workaround to implement the formula without any third-party "bigint" -libraries?

Upvotes: 31

Answers (12)

Simon Goater

Reputation: 1908

I posted an answer to a similar question which got deleted, so here is my effort in c for smallish n. Non-zero returned results are correct.

uint64_t nchoosek(uint64_t n, uint64_t k) {
  // No overflow for n <= 62.
  if ((n == 0) && (k == 0)) return 1;
  if (n == 0) return 0;
  if (k > n) return 0;
  if ((k == 0) || (k == n)) return 1;
  if (n-k < k) k = n-k;
  uint64_t res = n;
  if ((n > 62) && (n + k) >= 91) {
    for (uint32_t i=2; i<=k; i++) {
      uint64_t temp = n-i+1;
      if (res > 0xffffffffffffffffULL / temp) return 0; // Overflow check
      res = (res*temp)/i;
    }
  } else {
    for (uint32_t i=2; i<=k; i++) res = (res*(n-i+1))/i;
  }
  return res;
}

Upvotes: 0

iriyamanorio

Reputation: 1

A method similar to the Sieve of Eratosthenes. While the sieve of Eratosthenes is a multiple annihilation, this one is a multiple half-kill. Since n!/((n-r)!r!) is always an integer, first cancel the denominator and then multiply the rest. This algorithm works well even for non-big integers.

In the sequence of natural numbers, the k-th number can divide the (multiple of k)-th number. This can be done continuously with k=2,3,4,... Taking advantage of this fact, first cancel the denominator and then multiply the remainder. This ensures that if the answer does not overflow, it will not overflow in the course of the calculation.

Iriyama’s algorithm

public static BigInteger Combination(int n, int r)
{
    if (n < 0 || r < 0 || r > n) throw new ArgumentException("Invalid parameter");

    if (n - r < r) r = n - r;
    if (r == 0) return 1;
    if (r == 1) return n;

    int[] numerator = new int[r];
    int[] denominator = new int[r];

    for (int k = 0; k < r; k++)
    {
        numerator[k] = n - r + k + 1;
        denominator[k] = k + 1;
    }

    for (int p = 2; p <= r; p++)
    {
        int pivot = denominator[p - 1];
        if (pivot > 1)
        {
            int offset = (n - r) % p;
            for (int k = p - 1; k < r; k += p)
            {
                numerator[k - offset] /= pivot;
                denominator[k] /= pivot;
            }
        }
    }

    BigInteger result = BigInteger.One;
    for (int k = 0; k < r; k++)
    {
        if (numerator[k] > 1) result *= numerator[k];
    }
    return result;
}

Upvotes: 0

Robin Hsu

Reputation: 308

Improves Howard Hinnant's answer (in this question) a little bit: Calling gcd() per loop seems a bit slow. We could aggregate the gcd() call into the last one, while making the most use of the standard algorithm from Knuth's book "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms":

const uint64_t u64max = std::numeric_limits<uint64_t>::max();
uint64_t choose(uint64_t n, uint64_t k)
{
    if (k > n)
        throw std::invalid_argument(std::string("invalid argument in ") + __func__);

    if (k > n - k)
        k = n - k;

    uint64_t r = 1;
    uint64_t d;
    for (d = 1; d <= k; ++d) {
        if (r > u64max / n)
            break;
        r *= n--;
        r /= d;
    }

    if (d > k)
        return r;

    // Let N be the original n,
    // n is the current n (when we reach here)
    // We want to calculate C(N,k),
    // Currently we already calculated the r value so far:
    // r = C(N, n) = C(N, N-n) = C(N, d-1)
    // Note that N-n = d-1
    // In addition we know the following identity formula:
    //  C(N,k) = C(N,d-1) * C(N-d+1, k-d+1) / C(k, k-d+1)
    //         = C(N,d-1) * C(n, k-d+1) / C(k, k-d+1)
    // Using this formula, we effectively reduce the calculation,
    // while recursively use the same function.
    uint64_t b = choose(n, k-d+1);
    if (b == u64max) {
        return u64max;  // overflow
    }

    uint64_t c = choose(k, k-d+1);
    if (c == u64max) {
        return u64max;  // overflow
    }

    // Now, the combinatorial should be r * b / c
    // We can use gcd() to calculate this:
    // We Pick b for gcd: b < r almost (if not always) in all cases
    uint64_t g = gcd(b, c);
    b /= g;
    c /= g;
    r /= c;

    if (r > u64max / b)
        return u64max;   // overflow

    return r * b;
}

Note that the recursive depth is normally 2 (I don't really see a case goes to 3, the combinatorial reducing is quite decent.), i.e. calling choose() for 3 times, for non-overflow cases.

Replace uint64_t with unsigned long long if you prefer it.

Upvotes: 2

R.Falque

Reputation: 944

Using a dirty trick with a long double, it is possible to get the same accuracy as Howard Hinnant (and probably more):

unsigned long long n_choose_k(int n, int k)
{
    long double f = n;
    for (int i = 1; i<k+1; i++)
        f /= i;
    for (int i=1; i<k; i++)
        f *= n - i;

    unsigned long long f_2 = std::round(f);

    return f_2;
}

The idea is to divide first by k! and then to multiply by n(n-1)...(n-k+1). The approximation through the double can be avoided by inverting the order of the for loop.

Upvotes: 1

online6731

Reputation: 56

One of SHORTEST way :

int nChoosek(int n, int k){
    if (k > n) return 0;
    if (k == 0) return 1;
    return nChoosek(n - 1, k) + nChoosek(n - 1, k - 1);
}

Upvotes: 0

user4027538

Reputation:

Getting the prime factorization of the binomial coefficient is probably the most efficient way to calculate it, especially if multiplication is expensive. This is certainly true of the related problem of calculating factorial (see Click here for example).

Here is a simple algorithm based on the Sieve of Eratosthenes that calculates the prime factorization. The idea is basically to go through the primes as you find them using the sieve, but then also to calculate how many of their multiples fall in the ranges [1, k] and [n-k+1,n]. The Sieve is essentially an O(n \log \log n) algorithm, but there is no multiplication done. The actual number of multiplications necessary once the prime factorization is found is at worst O\left(\frac{n \log \log n}{\log n}\right) and there are probably faster ways than that.

prime_factors = []

n = 20
k = 10

composite = [True] * 2 + [False] * n

for p in xrange(n + 1):
if composite[p]:
    continue

q = p
m = 1
total_prime_power = 0
prime_power = [0] * (n + 1)

while True:

    prime_power[q] = prime_power[m] + 1
    r = q

    if q <= k:
        total_prime_power -= prime_power[q]

    if q > n - k:
        total_prime_power += prime_power[q]

    m += 1
    q += p

    if q > n:
        break

    composite[q] = True

prime_factors.append([p, total_prime_power])

 print prime_factors

Upvotes: 1

nhaa123

Reputation: 9798

Well, I have to answer to my own question. I was reading about Pascal's triangle and by accident noticed that we can calculate the amount of combinations with it:

#include <iostream>
#include <boost/cstdint.hpp>

boost::uint64_t Combinations(unsigned int n, unsigned int r)
{
    if (r > n)
        return 0;

    /** We can use Pascal's triange to determine the amount
      * of combinations. To calculate a single line:
      *
      * v(r) = (n - r) / r
      *
      * Since the triangle is symmetrical, we only need to calculate
      * until r -column.
      */

    boost::uint64_t v = n--;

    for (unsigned int i = 2; i < r + 1; ++i, --n)
        v = v * n / i;

    return v;
}

int main()
{
    std::cout << Combinations(52, 5) << std::endl;
}

Upvotes: 2

Howard Hinnant

Reputation: 218750

From Andreas' answer:

Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long
unsigned long long
choose(unsigned long long n, unsigned long long k) {
    if (k > n) {
        return 0;
    }
    unsigned long long r = 1;
    for (unsigned long long d = 1; d <= k; ++d) {
        r *= n--;
        r /= d;
    }
    return r;
}
This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.

UPDATE: There's a small possibility that the algorithm will overflow on the line:
r *= n--;
for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.

Consider n == 67 and k == 33. The above algorithm overflows with a 64 bit unsigned long long. And yet the correct answer is representable in 64 bits: 14,226,520,737,620,288,370. And the above algorithm is silent about its overflow, choose(67, 33) returns:

8,829,174,638,479,413

A believable but incorrect answer.

However the above algorithm can be slightly modified to never overflow as long as the final answer is representable.

The trick is in recognizing that at each iteration, the division r/d is exact. Temporarily rewriting:

r = r * n / d;
--n;

For this to be exact, it means if you expanded r, n and d into their prime factorizations, then one could easily cancel out d, and be left with a modified value for n, call it t, and then the computation of r is simply:

// compute t from r, n and d
r = r * t;
--n;

A fast and easy way to do this is to find the greatest common divisor of r and d, call it g:

unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
--n;

Now we can do the same thing with d_temp and n (find the greatest common divisor). However since we know a-priori that r * n / d is exact, then we also know that gcd(d_temp, n) == d_temp, and therefore we don't need to compute it. So we can divide n by d_temp:

unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
// now one can divide n by d/g without truncation
unsigned long long t = n / d_temp;
r = r * t;
--n;

Cleaning up:

unsigned long long
gcd(unsigned long long x, unsigned long long y)
{
    while (y != 0)
    {
        unsigned long long t = x % y;
        x = y;
        y = t;
    }
    return x;
}

unsigned long long
choose(unsigned long long n, unsigned long long k)
{
    if (k > n)
        throw std::invalid_argument("invalid argument in choose");
    unsigned long long r = 1;
    for (unsigned long long d = 1; d <= k; ++d, --n)
    {
        unsigned long long g = gcd(r, d);
        r /= g;
        unsigned long long t = n / (d / g);
        if (r > std::numeric_limits<unsigned long long>::max() / t)
           throw std::overflow_error("overflow in choose");
        r *= t;
    }
    return r;
}

Now you can compute choose(67, 33) without overflow. And if you try choose(68, 33), you'll get an exception instead of a wrong answer.

Upvotes: 36

Matthieu N.

Reputation:

The following routine will compute the n-choose-k, using the recursive definition and memoization. The routine is extremely fast and accurate:

inline unsigned long long n_choose_k(const unsigned long long& n,
                                     const unsigned long long& k)
{
   if (n  < k) return 0;
   if (0 == n) return 0;
   if (0 == k) return 1;
   if (n == k) return 1;
   if (1 == k) return n;       
   typedef unsigned long long value_type;
   value_type* table = new value_type[static_cast<std::size_t>(n * n)];
   std::fill_n(table,n * n,0);
   class n_choose_k_impl
   {
   public:

      n_choose_k_impl(value_type* table,const value_type& dimension)
      : table_(table),
        dimension_(dimension)
      {}

      inline value_type& lookup(const value_type& n, const value_type& k)
      {
         return table_[dimension_ * n + k];
      }

      inline value_type compute(const value_type& n, const value_type& k)
      {
         if ((0 == k) || (k == n))
            return 1;
         value_type v1 = lookup(n - 1,k - 1);
         if (0 == v1)
            v1 = lookup(n - 1,k - 1) = compute(n - 1,k - 1);
         value_type v2 = lookup(n - 1,k);
         if (0 == v2)
            v2 = lookup(n - 1,k) = compute(n - 1,k);
         return v1 + v2;
      }

      value_type* table_;
      value_type dimension_;
   };
   value_type result = n_choose_k_impl(table,n).compute(n,k);
   delete [] table;
   return result;
}

Upvotes: 6

int3

Reputation: 13201

If you want to be 100% sure that no overflows occur so long as the final result is within the numeric limit, you can sum up Pascal's Triangle row-by-row:

for (int i=0; i<n; i++) {
    for (int j=0; j<=i; j++) {
        if (j == 0) current_row[j] = 1;
        else current_row[j] = prev_row[j] + prev_row[j-1];
    }
    prev_row = current_row; // assume they are vectors
}
// result is now in current_row[r-1]

However, this algorithm is much slower than the multiplication one. So perhaps you could use multiplication to generate all the cases you know that are 'safe' and then use addition from there. (.. or you could just use a BigInt library).

Upvotes: -1

Andreas Brinck

Reputation: 52519

Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long

unsigned long long
choose(unsigned long long n, unsigned long long k) {
    if (k > n) {
        return 0;
    }
    unsigned long long r = 1;
    for (unsigned long long d = 1; d <= k; ++d) {
        r *= n--;
        r /= d;
    }
    return r;
}

This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.

UPDATE: There's a small possibility that the algorithm will overflow on the line:

r *= n--;

for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.

Upvotes: 44

altariste

Reputation: 327

Remember that

n! / ( n - r )! = n * ( n - 1) * .. * (n - r + 1 )

so it's way smaller than n!. So the solution is to evaluate n* ( n - 1 ) * ... * ( n - r + 1) instead of first calculating n! and then dividing it .

Of course it all depends on the relative magnitude of n and r - if r is relatively big compared to n, then it still won't fit.

Upvotes: 4

Calculating the Amount of Combinations

Answers (12)

Related Questions