coding_pleasures
coding_pleasures

Reputation: 879

Sum of maximal element from every possible subset of size 'k' of an array

I have a very large list comprising of about 10,000 elements and each element is an integer as big as 5 billion. I would like to find the sum of maximal elements from every possible subset of size 'k' (given by the user) of an array whose maximum size is 10,000 elements. The only solution that comes to my head is to generate each of the subset (using itertools) and find its maximum element. But this would take an insane amount of time! What would be a pythonic way to solve this?

Upvotes: 3

Views: 843

Answers (1)

alexis
alexis

Reputation: 50190

Don't use python, use mathematics first. This is a combinatorial problem: If you have an array S of n numbers (n large), and generate all possible subsets of size k, you want to calculate the sum of the maximal elements of the subsets.

Assuming the numbers are all distinct (though it also works if they are not), you can calculate exactly how often each will appear in a subset, and go on from there without ever actually constructing a subset. You should have taken it over to math.stackexchange.com, they'd have sorted you out in a jiffy. Here it is, but without the nice math notation:

Sort your array in increasing order and let S_1 be the smallest (first) number, S_2 the next smallest, and so on. (Note: Indexing from 1).

  1. S_n, the largest element, is obviously the maximal element of any subset it is part of, and there are exactly (n-1 choose k-1) such subsets.

  2. Of the subsets that don't contain S_n, there are (n-2 choose k-1) subsets that contain S_{n-1}, in which it is the largest element.

  3. Continue this until you come down to S_k, the k-th smallest number (counting from the smallest), which will be the maximum of exactly one subset: (k-1 choose k-1) = 1. Smaller numbers (S_1 to S_{k-1}) can never be maximal: Every set of k elements will contain something larger.

  4. Sum the above (n-k+1 terms), and there's your answer:

    S_n*(n-1 choose k-1) + S_{n-1}*(n-2 choose k-1) + ... + S_k*(k-1 choose k-1)
    

    Writing the terms from smallest to largest, this is just the sum

    Sum(i=k..n) S_i * (i-1 choose k-1)    
    

If we were on math.stackexchange you'd get it in the proper mathematical notation, but you get the idea.

Upvotes: 6

Related Questions