Seung Hyeon Yu
Seung Hyeon Yu

Reputation: 178

Numpy matrix multiplications with multiprocessing suddenly slow down as dimension increase

I want to do some large matrix multiplications using multiprocessing.Pool.

Suddenly, when the dimension is higher than 50, it takes an extremely long computation time.

Is there any easy way to be faster?

Here, I don't want to use shared memory like RawArray, because my original code randomly generate the matrix for each time.

The sample code is as follows.

import numpy as np
from time import time
from multiprocessing import Pool
from functools import partial

def f(d):
    a = int(10*d)
    N = int(10000/d)
    for _ in range(N):
        X = np.random.randn(a,10) @ np.random.randn(10,10)
    return X

# Dimensions
ds = [1,2,3,4,5,6,8,10,20,35,40,45,50,60,62,64,66,68,70,80,90,100]

# Serial processing
serial = []
for d in ds:
    t1 = time()
    for i in range(20):
        f(d)
    serial.append(time()-t1)

# Parallel processing
parallel = []
for d in ds:
    t1 = time()
    pool = Pool()
    for i in range(20):
        pool.apply_async(partial(f,d), args=())
    pool.close()
    pool.join()
    parallel.append(time()-t1)

# Plot
import matplotlib.pyplot as plt
plt.title('Matrix multiplication time with 10000/d repetitions')
plt.plot(ds,serial,label='serial')
plt.plot(ds,parallel,label='parallel')
plt.xlabel('d (dimension)') 
plt.ylabel('Total time (sec)')
plt.legend()
plt.show()

Due to the total computation cost of f(d) is the same for all d, the parallel processing time should be equal.

But the actual output is not.

enter image description here

System info:

Linux-4.15.0-47-generic-x86_64-with-debian-stretch-sid
3.6.8 |Anaconda custom (64-bit)| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0]
Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz

NOTE I want to use parallel computation as a complicated internal simulation (like @), not sending data to the child process.

Upvotes: 4

Views: 705

Answers (2)

Shane P Kelly
Shane P Kelly

Reputation: 221

I just found an explanation here: https://github.com/numpy/numpy/issues/10145. Looks like the CPU caching gets messed up when you have conflicting MKL matrix multiplications going at the same time.

Upvotes: 0

Seung Hyeon Yu
Seung Hyeon Yu

Reputation: 178

This is for self-reference.

Here, I found a solution.

My numpy uses MKL as backend, it may be the problem that MKL multithreading collides multiprocessing.

If I run the code:

import os
os.environ['MKL_NUM_THREADS'] = '1'

before importing numpy, then it solved.

enter image description here

Upvotes: 1

Related Questions