Reputation: 879
I have a very large list comprising of about 10,000 elements and each element is an integer as big as 5 billion. I would like to find the sum of maximal elements from every possible subset of size 'k' (given by the user) of an array whose maximum size is 10,000 elements. The only solution that comes to my head is to generate each of the subset (using itertools) and find its maximum element. But this would take an insane amount of time! What would be a pythonic way to solve this?
Upvotes: 3
Views: 843
Reputation: 50190
Don't use python, use mathematics first. This is a combinatorial problem: If you have an array S
of n numbers (n large), and generate all possible subsets of size k, you want to calculate the sum of the maximal elements of the subsets.
Assuming the numbers are all distinct (though it also works if they are not), you can calculate exactly how often each will appear in a subset, and go on from there without ever actually constructing a subset. You should have taken it over to math.stackexchange.com
, they'd have sorted you out in a jiffy. Here it is, but without the nice math notation:
Sort your array in increasing order and let S_1
be the smallest (first) number,
S_2
the next smallest, and so on. (Note: Indexing from 1).
S_n
, the largest element, is obviously the maximal element of any subset
it is part of, and there are exactly (n-1 choose k-1)
such subsets.
Of the subsets that don't contain S_n, there are (n-2 choose k-1)
subsets that contain S_{n-1}
, in which it is the largest element.
Continue this until you come down to S_k
, the k-th
smallest number
(counting from the smallest), which will be the maximum of exactly one
subset: (k-1 choose k-1) = 1
. Smaller numbers (S_1
to S_{k-1}
)
can never be maximal: Every set of k
elements will contain something
larger.
Sum the above (n-k+1 terms)
, and there's your answer:
S_n*(n-1 choose k-1) + S_{n-1}*(n-2 choose k-1) + ... + S_k*(k-1 choose k-1)
Writing the terms from smallest to largest, this is just the sum
Sum(i=k..n) S_i * (i-1 choose k-1)
If we were on math.stackexchange you'd get it in the proper mathematical notation, but you get the idea.
Upvotes: 6