Reputation: 395
In trying to use Scipy's optimization algorithms to minimize a function that computes its value within a sub process, I discovered that gradient-based algorithms (basinhopping and L-BFGS-B so far) encounter the following error on line 562 of optimize.py:
grad[k] = (f(*((xk + d,) + args)) - f0) / d[k]
TypeError: unsupported operand type(s) for -: 'NoneType' and 'NoneType'
Here's a simple example of code that generates this error:
import multiprocessing as mp
from scipy.optimize import basinhopping
def runEnvironment(x):
return x**2
def func(x):
if __name__ == '__main__':
print "x:",x
pool = mp.Pool(processes=1)
results=pool.apply(runEnvironment,(x,))
pool.close()
return results
x0=5
ret=basinhopping(func, x0, niter=100, T=1.0, stepsize=0.1, minimizer_kwargs=None, take_step=None, accept_test=None, callback=None, interval=50, disp=False, niter_success=None)
Note that this code runs fine if the multiprocessing components are removed, or if a non-gradient-based algorithm (like COBYLA) is used. Can anyone think of a reason this is happening?
Upvotes: 1
Views: 2216
Reputation: 879849
It is inefficient to create many little mp.Pool
s each with only one worker
process. It is also inefficient to create one Pool per call to func
, since
func
is called many times.
Instead, create one Pool
at the outset of your program, and pass the pool to each call to func
:
if __name__ == '__main__':
pool = mp.Pool()
x0=5
ret = optimize.basinhopping(
func, x0, niter=100, T=1.0, stepsize=0.1,
minimizer_kwargs=dict(args=pool),
take_step=None, accept_test=None, callback=None,
interval=50, disp=False, niter_success=None)
pool.close()
print(ret)
The minimizer_kwargs=dict(args=pool)
tells optimize.basinhopping
to pass pool
as an additional argument to func
.
You can also use
logger = mp.log_to_stderr(logging.INFO)
to obtain logging statements which show in what process the functions are being called. For example,
import multiprocessing as mp
from scipy import optimize
import logging
logger = mp.log_to_stderr(logging.INFO)
def runEnvironment(x):
logger.info('runEnvironment({}) called'.format(x))
return x**2
def func(x, pool):
logger.info('func({}) called'.format(x))
results = pool.apply(runEnvironment,(x,))
return results
if __name__ == '__main__':
pool = mp.Pool()
x0=5
ret = optimize.basinhopping(
func, x0, niter=100, T=1.0, stepsize=0.1,
minimizer_kwargs=dict(args=pool),
take_step=None, accept_test=None, callback=None,
interval=50, disp=False, niter_success=None)
pool.close()
print(ret)
prints
[INFO/PoolWorker-1] child process calling self.run()
[INFO/PoolWorker-2] child process calling self.run()
[INFO/PoolWorker-3] child process calling self.run()
[INFO/PoolWorker-4] child process calling self.run()
[INFO/MainProcess] func([ 5.]) called
[INFO/PoolWorker-1] runEnvironment([ 5.]) called
[INFO/MainProcess] func([ 5.00000001]) called
[INFO/PoolWorker-2] runEnvironment([ 5.00000001]) called
[INFO/MainProcess] func([ 5.]) called
[INFO/PoolWorker-3] runEnvironment([ 5.]) called
[INFO/MainProcess] func([-5.]) called
[INFO/PoolWorker-4] runEnvironment([-5.]) called
This shows that func
is always called by the main process, while
runEnvironment
is run by worker processes.
Note that the calls to func
happen sequentially. To get any benefit out of
pool
, you'll have to exercise more processors with each call to func
.
Upvotes: 0
Reputation: 741
Your if __name__ == '__main__':
idiom is in the incorrect position - rearranging it this way works:
import multiprocessing as mp
from scipy.optimize import basinhopping
def runEnvironment(x):
return x**2
def func(x):
print "x:",x
pool = mp.Pool(processes=1)
results=pool.apply(runEnvironment,(x,))
pool.close()
return results
if __name__ == '__main__':
x0=5
ret=basinhopping(func, x0, niter=100, T=1.0, stepsize=0.1, minimizer_kwargs=None, take_step=None, accept_test=None, callback=None, interval=50, disp=False, niter_success=None)
Upvotes: 1