Reputation: 119
Here’s a quick example of I’m trying to do and the error I’m getting:
for symbol in itertools.product(list_a, repeat=8):
list_b.append(symbol)
I’m also afterwards excluding combinations from that list like so:
for combination in list_b:
valid_b = True
for symbols in range(len(list_exclude)):
if list_exclude[symbols] in combination:
valid_b = False
else:
pass
if valid_b:
new_list.append(combination)
I’ve heard somehow chunking the process might help, not sure how that could be done here though.
I’m using multiprocessing for this as well.
When I run it I get “MemoryError”
How would you go about it?
Upvotes: 1
Views: 243
Reputation: 114518
Don't pre-compute anything, especially not the first full list:
def symbols(lst, exclude):
for symbol in map(''.join, itertools.product(lst, repeat=8)):
if any(map(symbol.__contains__, exclude)):
continue
yield symbol
Now use the generator as you need to lazily evaluate the elements. Keep in mind that since it's pre-filtering the data, even list(symbols(list_a, list_exclude))
will he much cheaper than what you originally wrote.
Here is a breakdown of what happens:
itertools.product
is a generator. That means that it produces an output without retaining a reference to any previous items. Each element it returns is a tuple
containing some combination of the input elements.
Since you want to compare strings, you need to convert the tuple
s. Hence, ''.join
. Mapping it onto each of the tuple
s that itertools.product
produces converts those elements into strings. For example:
>>> ''.join(('$', '$', '&', '&', '♀', '@', '%', '$'))
'$$&&♀@%$'
Filtering each symbol
thus created can be done by checking if any of the items in excludes
are contained in it. You can do this with something like
[ex in symbol for ex in exclude]
The operation ... in symbol
is implemented via the magic method symbol.__contains__
. You can therefore map
that method to every element of exclude
.
Since the first element of exclude
that is contained in symbol
invalidates it, you don't need to check the remainder. This is called short-circuiting, and is implemented in the any
function. Notice that because map
is a generator, the remaining elements will actually not be computed once a match is found. This is different from using a list comprehension, which pre-computed all the elements.
Putting yield
into your function turns it into a generator function. That means that when you call symbols(...)
, it returns a generator object that you can iterate over. This object does not pre-compute anything until you call next
on it. So if you write the data to a file (for example), only the current result will be in memory at once. It may take a long time to write out a large number of results but your memory usage should not spike at all from it.
Upvotes: 3
Reputation: 49
This little change i made could save you a bit of ram usage.
for combination in list_b:
valid_b = True
for symbols in list_exclude:
if symbols in combination:
valid_b = False
else:
pass
if valid_b:
new_list.append(combination)
Upvotes: -1