Reputation: 450
I'm using Python's multiprocessing lib to speed up some code (least squares fitting with scipy).
It works fine on 3 different machines, but it shows a strange behaviour on a 4th machine.
The code:
import numpy as np
from scipy.optimize import least_squares
import time
import parmap
from multiprocessing import Pool
p0 = [1., 1., 0.5]
def f(p, xx):
return p[0]*np.exp(-xx ** 2 / p[1] ** 2) + p[2]
def errorfunc(p, xx, yy):
return f(p, xx) - yy
def do_fit(yy, xx):
return least_squares(errorfunc, p0[:], args=(xx, yy))
if __name__ == '__main__':
# create data
x = np.linspace(-10, 10, 1000)
y = []
np.random.seed(42)
for i in range(1000):
y.append(f([np.random.rand(1) * 10, np.random.rand(1), 0.], x) + np.random.rand(len(x)))
# fit without multiprocessing
t1 = time.time()
for y_data in y:
p1 = least_squares(errorfunc, p0[:], args=(x, y_data))
t2 = time.time()
print t2 - t1
# fit with multiprocessing lib
times = []
for p in range(1,13):
my_pool = Pool(p)
t3 = time.time()
results = parmap.map(do_fit, y, x, pool=my_pool)
t4 = time.time()
times.append(t4-t3)
my_pool.close()
print times
For the 3 machines where it works, it speeds up roughly in the expected way. E.g. on my i7 laptop it gives:
[4.92650294303894, 2.5883090496063232, 1.7945551872253418, 1.629533052444458,
1.4896039962768555, 1.3550388813018799, 1.1796400547027588, 1.1852478981018066,
1.1404039859771729, 1.2239141464233398, 1.1676840782165527, 1.1416618824005127]
I'm running Ubuntu 14.10, Python 2.7.6, numpy 1.11.0 and scipy 0.17.0. I tested it on another Ubuntu machine, a Dell PowerEdge R210 with similar results and on a MacBook Pro Retina (here with Python 2.7.11, and same numpy and scipy versions).
The computer that causes issues is a PowerEdge R710 (two hexcores) running Ubuntu 15.10, Python 2.7.11 and same numpy and scipy version as above. However, I don't observe any speedup. Times are around 6 seconds, no matter what poolsize I use. In fact, it is slightly better for a poolsize of 2 and gets worse for more processes.
htop
shows that somehow more processes get spawned than I would expect.
E.g. on my laptop htop
shows one entry per process (which matches the poolsize) and eventually each process shows 100% CPU load.
On the PowerEdge R710 I see about 8 python processes for a poolsize of 1 and about 20 processes for a poolsize of 2 etc. each of which shows 100% CPU load.
I checked BIOS settings of the R710 and I couldn't find anything unusual. What should I look for?
EDIT: Answering to the comment, I used another simple script. Surprisingly this one seems to 'work' for all machines:
from multiprocessing import Pool
import time
import math
import numpy as np
def f_np(x):
return x**np.sin(x)+np.fabs(np.cos(x))**np.arctan(x)
def f(x):
return x**math.sin(x)+math.fabs(math.cos(x))**math.atan(x)
if __name__ == '__main__':
print "#pool", ", numpy", ", pure python"
for p in range(1,9):
pool = Pool(processes=p)
np.random.seed(42)
a = np.random.rand(1000,1000)
t1 = time.time()
for i in range(5):
pool.map(f_np, a)
t2 = time.time()
for i in range(5):
pool.map(f, range(1000000))
print p, t2-t1, time.time()-t2
pool.close()
gives:
#pool , numpy , pure python
1 1.34186911583 5.87641906738
2 0.697530984879 3.16030216217
3 0.470160961151 2.20742988586
4 0.35701417923 1.73128080368
5 0.308979988098 1.47339701653
6 0.286448001862 1.37223601341
7 0.274246931076 1.27663207054
8 0.245123147964 1.24748778343
on the machine that caused the trouble. There are no more threads (or processes?) spawned than I would expect.
It looks like numpy is not the problem, but as soon as I use scipy.optimize.least_squares
the issue arises.
Using on htop
on the processes shows a lot of sched_yield()
calls which I don't see if I don't use scipy.optimize.least_squares
and which I also don't see on my laptop even when using least_squares
.
Upvotes: 4
Views: 904
Reputation: 450
According to here, there is an issue when OpenBLAS is used together with joblib.
Similar issues occur when MKL is used (see here). The solution given here, also worked for me: Adding
import os
os.environ['MKL_NUM_THREADS'] = '1'
at the beginning of my python script solves the issue.
Upvotes: 4