Reputation: 178
I want to do some large matrix multiplications using multiprocessing.Pool
.
Suddenly, when the dimension is higher than 50, it takes an extremely long computation time.
Is there any easy way to be faster?
Here, I don't want to use shared memory like RawArray
, because my original code randomly generate the matrix for each time.
The sample code is as follows.
import numpy as np
from time import time
from multiprocessing import Pool
from functools import partial
def f(d):
a = int(10*d)
N = int(10000/d)
for _ in range(N):
X = np.random.randn(a,10) @ np.random.randn(10,10)
return X
# Dimensions
ds = [1,2,3,4,5,6,8,10,20,35,40,45,50,60,62,64,66,68,70,80,90,100]
# Serial processing
serial = []
for d in ds:
t1 = time()
for i in range(20):
f(d)
serial.append(time()-t1)
# Parallel processing
parallel = []
for d in ds:
t1 = time()
pool = Pool()
for i in range(20):
pool.apply_async(partial(f,d), args=())
pool.close()
pool.join()
parallel.append(time()-t1)
# Plot
import matplotlib.pyplot as plt
plt.title('Matrix multiplication time with 10000/d repetitions')
plt.plot(ds,serial,label='serial')
plt.plot(ds,parallel,label='parallel')
plt.xlabel('d (dimension)')
plt.ylabel('Total time (sec)')
plt.legend()
plt.show()
Due to the total computation cost of f(d)
is the same for all d
, the parallel processing time should be equal.
But the actual output is not.
System info:
Linux-4.15.0-47-generic-x86_64-with-debian-stretch-sid
3.6.8 |Anaconda custom (64-bit)| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0]
Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz
NOTE I want to use parallel computation as a complicated internal simulation (like
@
), not sending data to the child process.
Upvotes: 4
Views: 705
Reputation: 221
I just found an explanation here: https://github.com/numpy/numpy/issues/10145. Looks like the CPU caching gets messed up when you have conflicting MKL matrix multiplications going at the same time.
Upvotes: 0
Reputation: 178
This is for self-reference.
Here, I found a solution.
My numpy
uses MKL as backend, it may be the problem that MKL multithreading collides multiprocessing
.
If I run the code:
import os
os.environ['MKL_NUM_THREADS'] = '1'
before importing numpy
, then it solved.
Upvotes: 1