Reputation: 6752
In Parallel Python, why is it necessary to wrap any modules the function passed will need along with variables and namespaces in that job submission call - how necessary is it to preserve module level "global" variables? (if that's all that's going on)
submit function:
submit(self, func, args=(), depfuncs=(), modules=(), callback=None, callbackargs=(),group='default', globals=None)
Submits function to the execution queue
func - function to be executed
args - tuple with arguments of the 'func'
depfuncs - tuple with functions which might be called from 'func'
modules - tuple with module names to import
callback - callback function which will be called with argument
list equal to callbackargs+(result,)
as soon as calculation is done
callbackargs - additional arguments for callback function
group - job group, is used when wait(group) is called to wait for
jobs in a given group to finish
globals - dictionary from which all modules, functions and classes
will be imported, for instance: globals=globals()
Upvotes: 3
Views: 536
Reputation: 65893
The reason that pp
works the way it does, is that it makes a fresh instance of the Python interpreter for every worker, which is completely independent from anything that has run before or since. This ensures that there are no unintended side-effects, such as __future__
imports being active in the worker process. The problem with this is that it makes things way more complicated to get right, and in my experience with pp
, not particularly robust. pp
does try to make things a bit easier for the user, but seems to introduce more problems than it solves in its efforts to do that.
If I were to write code that was designed for use on a cluster from the start, I would probably end up using pp
, but I've found that adapting existing code to work with pp
is a nightmare.
Upvotes: 3