Numpy matrix multiplications with multiprocessing suddenly slow down as dimension increase

Question

I want to do some large matrix multiplications using multiprocessing.Pool.

Suddenly, when the dimension is higher than 50, it takes an extremely long computation time.

Is there any easy way to be faster?

Here, I don't want to use shared memory like RawArray, because my original code randomly generate the matrix for each time.

The sample code is as follows.

import numpy as np
from time import time
from multiprocessing import Pool
from functools import partial

def f(d):
    a = int(10*d)
    N = int(10000/d)
    for _ in range(N):
        X = np.random.randn(a,10) @ np.random.randn(10,10)
    return X

# Dimensions
ds = [1,2,3,4,5,6,8,10,20,35,40,45,50,60,62,64,66,68,70,80,90,100]

# Serial processing
serial = []
for d in ds:
    t1 = time()
    for i in range(20):
        f(d)
    serial.append(time()-t1)

# Parallel processing
parallel = []
for d in ds:
    t1 = time()
    pool = Pool()
    for i in range(20):
        pool.apply_async(partial(f,d), args=())
    pool.close()
    pool.join()
    parallel.append(time()-t1)

# Plot
import matplotlib.pyplot as plt
plt.title('Matrix multiplication time with 10000/d repetitions')
plt.plot(ds,serial,label='serial')
plt.plot(ds,parallel,label='parallel')
plt.xlabel('d (dimension)') 
plt.ylabel('Total time (sec)')
plt.legend()
plt.show()

Due to the total computation cost of f(d) is the same for all d, the parallel processing time should be equal.

But the actual output is not.

System info:

Linux-4.15.0-47-generic-x86_64-with-debian-stretch-sid
3.6.8 |Anaconda custom (64-bit)| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0]
Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz

NOTE I want to use parallel computation as a complicated internal simulation (like @), not sending data to the child process.

Seung Hyeon Yu · Accepted Answer

This is for self-reference.

Here, I found a solution.

My numpy uses MKL as backend, it may be the problem that MKL multithreading collides multiprocessing.

If I run the code:

import os
os.environ['MKL_NUM_THREADS'] = '1'

before importing numpy, then it solved.

Numpy matrix multiplications with multiprocessing suddenly slow down as dimension increase

Answers (2)

Related Questions