Yury Kirienko
Yury Kirienko

Reputation: 2029

itertools.ifilter with IPython Parallel

For some problem [proven to be NP hard] I have no other option but exhaustive search. I have a set of data — for simplicity, S = ['A', 'B', 'C', ... ,'Z'] and want to apply a function f to all subsets of length N < len(S) of this set. I cannot use lists here since binomial coefficients binom(len(S),N) are some billions. But the result of f(x), x∈S is zero for almost all the values of S. Therefore in simple cases all works great with

   from itertools import ifilter, combinations
   answer = list(ifilter(lambda x: f(x) > 0, combinations(S,N)))

But in real life, len(S) ~ 10⁴ and N ~ 10². What I want is to spread the work among CPU engines using ipyparallel. I have a small cluster with a hundred of CPU cores. But I still cannot afford to store combinations as lists, therefore I need something like separate generators.

There are a couple of examples of how to split generator into chunks, but as far as I understand they are still consecutive generators. There is also an idea of @minrk that is related but it performs really bad for some reason.

So the questions are:

Upvotes: 4

Views: 176

Answers (1)

user2357112
user2357112

Reputation: 281551

Exhaustive search is utterly hopeless here, no matter how you parallelize it. With len(S) and N at such high orders of magnitude, you would need to search through about 6e241 solution candidates. This is far beyond the capacity of any computational system humanity could ever hope to build.

You will need to use a smarter algorithm.

Upvotes: 1

Related Questions