Reputation: 181
Imagine we have a list of stocks:
stocks = ['AAPL','GOOGL','IBM']
The specific stocks don't matter, what matters is that we have n items in this list.
Imagine we also have a list of weights, from 0% to 100%:
weights = list(range(101))
Given n = 3 (or any other number) I need to produce a matrix with every possible combinations of weights that sum to a full 100%. E.g.
0%, 0%, 100%
1%, 0%, 99%
0%, 1%, 99%
etc...
Is there some method of itertools that can do this? Something in numpy? What is the most efficient way to do this?
Upvotes: 3
Views: 856
Reputation: 30601
This is a classic Stars and bars problem, and Python's itertools
module does indeed provide a solution that's both simple and efficient, without any additional filtering needed.
Some explanation first: you want to divide 100 "points" between 3 stocks in all possible ways. For illustration purposes, let's reduce to 10 points instead of 100, with each one worth 10% instead of 1%. Imagine writing those points as a string of ten *
characters:
**********
These are the "stars" of "stars and bars". Now to divide the ten stars amongst the 3 stocks, we insert two |
divider characters (the "bars" of "stars and bars"). For example, one such division might look like this::
**|*******|*
This particular combination of stars and bars would correspond to the division 20% AAPL, 70% GOOGL, 10% IBM. Another division might look like:
******||****
which would correspond to 60% AAPL, 0% GOOGL, 40% IBM.
It's easy to convince yourself that every string consisting of ten *
characters and two |
characters corresponds to exactly one possible division of the ten points amongst the three stocks.
So to solve your problem, all we need to do is generate all possible strings containing ten *
star characters and two |
bar characters. Or, to think of this another way, we want to find all possible pairs of positions that we can place the two bar characters, in a string of total length twelve. Python's itertools.combinations
function can be used to give us those possible positions, (for example with itertools.combinations(range(12), 2)
) and then it's simple to translate each pair of positions back to a division of range(10)
into three pieces: insert an extra imaginary divider character at the start and end of the string, then find the number of stars between each pair of dividers. That number of stars is simply one less than the distance between the two dividers.
Here's the code:
import itertools
def all_partitions(n, k):
"""
Generate all partitions of range(n) into k pieces.
"""
for c in itertools.combinations(range(n+k-1), k-1):
yield tuple(y-x-1 for x, y in zip((-1,) + c, c + (n+k-1,)))
For the case you give in the question, you want all_partitions(100, 3)
. But that yields 5151
partitions, starting with (0, 0, 100)
and ending with (100, 0, 0)
, so it's impractical to show the results here. Instead, here are the results in a smaller case:
>>> for partition in all_partitions(5, 3):
... print(partition)
...
(0, 0, 5)
(0, 1, 4)
(0, 2, 3)
(0, 3, 2)
(0, 4, 1)
(0, 5, 0)
(1, 0, 4)
(1, 1, 3)
(1, 2, 2)
(1, 3, 1)
(1, 4, 0)
(2, 0, 3)
(2, 1, 2)
(2, 2, 1)
(2, 3, 0)
(3, 0, 2)
(3, 1, 1)
(3, 2, 0)
(4, 0, 1)
(4, 1, 0)
(5, 0, 0)
Upvotes: 0
Reputation: 365925
The way to optimize this isn't to figure out a faster way to generate the permutations, it's to generate as few permutations as possible.
First, how would you do this if you only wanted the combination that were in sorted order?
You don't need to generate all possible combinations of 0 to 100 and then filter that. The first number, a
, can be anywhere from 0 to 100. The second number, b
, can be anywhere from 0 to (100-a). The third number, c
, can only be 100-a-b. So:
for a in range(0, 101):
for b in range(0, 101-a):
c = 100-a-b
yield a, b, c
Now, instead of generating 100*100*100
combination to filter them down to 100*50*1+1
, we're just generating the 100*50*1+1
, for a 2000x speedup.
However, keep in mind that there are still around X * (X/2)**N
answers. So, computing them in X * (X/2)**N
time instead of X**N
may be optimal—but it's still exponential time. And there's no way around that; you want an exponential number of results, after all.
You can look for ways to make the first part more concise with itertools.product
combined with reduce
or accumulate
, but I think it's going to end up less readable, and you want to be able to extend to any arbitrary N
, and also to get all permutations rather than just the sorted ones. So keep it understandable until you do that, and then look for ways to condense it after you're done.
You obviously need to either go through N steps. I think this is easier to understand with recursion than a loop.
When n
is 1, the only combination is (x,)
.
Otherwise, for each of the values a from 0 to x, you can have that value, together with all of the combinations of n-1 numbers that sum to x-a. So:
def sum_to_x(x, n):
if n == 1:
yield (x,)
return
for a in range(x+1):
for result in sum_to_x(x-a, n-1):
yield (a, *result)
Now you just need to add in the permutations, and you're done:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from itertools.permutations(combi)
But there's one problem: permutations
permutes positions, not values. So if you have, say, (100, 0, 0)
, the six permutations of that are (100, 0, 0)
, (100, 0, 0)
, (0, 100, 0)
, (0, 0, 100)
, (0, 100, 0)
, (0, 0, 100)
.
If N is very small—as it is in your example, with N=3 and X=100—it may be fine to just generate all 6 permutations of each combination and filter them:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from set(itertools.permutations(combi))
… but if N can grow large, we're talking about a lot of wasted work there as well.
There are plenty of good answers here on how to do permutations without repeated values. See this question, for example. Borrowing an implementation from that answer:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from unique_permutations(combi)
Or, if we can drag in SymPy or more-itertools
:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from sympy.multiset_permutations(combi)
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from more_itertools.distinct_permutations(combi)
Upvotes: 5
Reputation: 39072
What you need is combinations_with_replacement
because in your question you wrote 0, 0, 100 which means you expect repetition, like 20, 20, 60 etc.
from itertools import combinations_with_replacement
weights = range(11)
n = 3
list = [i for i in combinations_with_replacement(weights, n) if sum(i) == 10]
print (list)
The above code results in
[(0, 0, 10), (0, 1, 9), (0, 2, 8), (0, 3, 7), (0, 4, 6), (0, 5, 5), (1, 1, 8), (1, 2, 7), (1, 3, 6), (1, 4, 5), (2, 2, 6), (2, 3, 5), (2, 4, 4), (3, 3, 4)]
Replace range(10)
, n
and sum(i) == 10
by whatever you need.
Upvotes: 0
Reputation: 5012
What you are looking for is product
from itertools
module
you can use it as shown below
from itertools import product
weights = list(range(101))
n = 3
lst_of_weights = [i for i in product(weights,repeat=n) if sum(i)==100]
Upvotes: 0