Why do numpy arrays take longer to process than lists with multiprocessing library?

Question

I am running on a machine with two AMD 7302 16 core processors (a total of 32 core). I'm on a Red Hat 8.4 system and using Python 3.10.6.

I've recently started learning the multiprocessing library. Inspired by first example on the documentation page, I wrote my own little code :

from multiprocessing import Pool
import numpy as np
import sys
import datetime

def f(x):
    return x**2

def main(DataType="List", NThr=2, Vectorize=False):
    N = 5*10**7           # number of elements
    n = NThr              # number of threads
    y = np.zeros(N)
    # Use list
    if(DataType == "List"):
        x = []
        for i in range(N):
            x.append(i)
    # Use Numpy
    elif(DataType=="Numpy"):
        x = np.zeros(N)
        for i in range(len(x)):
            x[i] = i
    # Run parallel code
    t0 = datetime.datetime.now()
    if(n==1):
        if(DataType == "Numpy" and Vectorize == True):
            y = np.vectorize(f)(x)
        else:
            for i in range(len(x)):
                y[i] = f(x[i])
    else:
        with Pool(n) as p:
            y = p.map(f, x)
    t1 = datetime.datetime.now()
    dt = (t1 - t0).total_seconds()
    print("{} : Vect = {}, n = {}, time : {}s".format(DataType,Vectorize,n,dt))
    sys.exit(0)

if __name__ == "__main__":
    main()

I noticed that when I try to run p.map() over a numpy array, it performs substantially worse. Here is the output from several runs (python mycode.py) after twiddling the args to main :

Numpy : Vect = True, n = 1, time : 9.566441s
Numpy : Vect = False, n = 1, time : 16.00333s
Numpy : Vect = False, n = 2, time : 143.331352s
List : Vect = False, n = 1, time : 21.11657s
List : Vect = False, n = 2, time : 11.868897s
List : Vect = False, n = 5, time : 6.162561s

Look at the (Numpy, n=2) run at 143s. It's run time is substantially worse than the (List, n=2) run at 11.9s. It is also much worse than either of the (Numpy, n=1) runs.

Question :

What makes numpy arrays take so long to run with the multiprocessing library, specifically when NThr==2?

EDIT :

Per a comment's suggestion, I ran both versions (Numpy, n=2) and (List, n=2) through the profiler :

>>> import cProfile                                                                                                                                                 
>>> from mycode import main                                                                                                                           
>>> cProfile.run('main()')

and compared them side by side. The most time consuming function calls and the calls with different numbers to them are listed below.

For Numpy version :

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
# Time consuming
1    0.000    0.000  138.997  138.997 pool.py:362(map)
1    0.000    0.000  138.956  138.956 pool.py:764(wait)
1    0.000    0.000  138.956  138.956 pool.py:767(get)
4    0.000    0.000  138.957   34.739 threading.py:288(wait)
4    0.000    0.000  138.957   34.739 threading.py:589(wait)
14/1    0.000    0.000  145.150  145.150 {built-in method builtins.exec}
19  138.957    7.314  138.957    7.314 {method 'acquire' of '_thread.lock' objects}
# Different number of calls
6    0.000    0.000    0.088    0.015 popen_fork.py:24(poll)
1    0.000    0.000    0.088    0.088 popen_fork.py:36(wait)
1    0.000    0.000    0.088    0.088 process.py:142(join)
10    0.000    0.000    0.000    0.000 process.py:99(_check_closed)
18    0.000    0.000    0.000    0.000 util.py:48(debug)
76    0.000    0.000    0.000    0.000 {built-in method builtins.len}
2    0.000    0.000    0.000    0.000 {built-in method numpy.zeros}
17    0.000    0.000    0.000    0.000 {built-in method posix.getpid}
6    0.088    0.015    0.088    0.015 {built-in method posix.waitpid}
3    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}

For List version :

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
# Time consuming
1    0.000    0.000   13.961   13.961 pool.py:362(map)
1    0.000    0.000   13.920   13.920 pool.py:764(wait)
1    0.000    0.000   13.920   13.920 pool.py:767(get)
4    0.000    0.000   13.921    3.480 threading.py:288(wait)
4    0.000    0.000   13.921    3.480 threading.py:589(wait)
14/1    0.000    0.000   24.475   24.475 {built-in method builtins.exec}
19   13.921    0.733   13.921    0.733 {method 'acquire' of '_thread.lock' objects}
# Different number of calls
7    0.000    0.000    0.132    0.019 popen_fork.py:24(poll)
2    0.000    0.000    0.132    0.066 popen_fork.py:36(wait)
2    0.000    0.000    0.132    0.066 process.py:142(join)
12    0.000    0.000    0.000    0.000 process.py:99(_check_closed)
19    0.000    0.000    0.000    0.000 util.py:48(debug)
75    0.000    0.000    0.000    0.000 {built-in method builtins.len}
1    0.000    0.000    0.000    0.000 {built-in method numpy.zeros}
18    0.000    0.000    0.000    0.000 {built-in method posix.getpid}
7    0.132    0.019    0.132    0.019 {built-in method posix.waitpid}
50000003    2.780    0.000    2.780    0.000 {method 'append' of 'list' objects}

Note that for the List version, there are 50000003 calls to append() compared to 3 calls to append() in the Numpy version. due to the initialization of the x.

Why do numpy arrays take longer to process than lists with multiprocessing library?

Answers (1)

Related Questions