Mikhail Genkin
Mikhail Genkin

Reputation: 3460

Multithreading accelerates CPU bound tasks despite of GIL

I recently learned about GIL in python. I was doing some benchmarks and found out that multithreading actually does improve the performance. I compare elementwise NumPy operations that do not use any internal multithreading. In the first test, I call a function 32 times sequentially from a for loop. In the second case, I use multithreading. But if GIL was working, in the second case only 1 thread should be active at a time, so that the execution time should be approximately equal (even worse in the second case due to multithreading overhead). This is not what I observed.

import os
import threading
import numpy as np, time

def elemntwiseoperations(a,b):
    np.exp(a)+np.sin(b)
        
N=1024
a=np.random.rand(N,N)
b=np.random.rand(N,N)


NoTasks=32

start_time = time.time()
for i in range(NoTasks):
    elemntwiseoperations(a,b)
print("Execution time for {} tasks: {} seconds, {} seconds per task".format(NoTasks,time.time() - start_time,(time.time() - start_time)/NoTasks))

threads=[]
start_time = time.time()
for i in range(NoTasks):
    x = threading.Thread(target=elemntwiseoperations,name=''.format(i),args=(a,b))
    x.start()
    threads.append(x)
    
for process in threads:
    process.join()

print("Execution time for {} tasks: {} seconds, {} seconds per task".format(NoTasks,time.time() - start_time,(time.time() - start_time)/NoTasks))

Output:

Execution time for 32 tasks: 0.5654711723327637 seconds, 0.01767103374004364 seconds per task
Execution time for 32 tasks: 0.17153215408325195 seconds, 0.005360409617424011 seconds per task

P.S. MAC os, python 3.7.6, Cpython implementation.

Upvotes: 4

Views: 387

Answers (1)

Mikhail Genkin
Mikhail Genkin

Reputation: 3460

So, my current best guess is the following: In the first case, one thread starts C routines sequentially. It waits for each to finish before starting the new one. Since I only use elementwise operation that are not parallelized in numpy, only one thread is invloved in the whole process.

In the second case, I call for 32 virtual threads, each is affected by the GIL. The first thread starts up C routine and gives GIL control to the second thread, then the second thread starts C routine and gives control to the third thread, and so on. Even though C routines are called not at the same time, they all execute concurtently, as C is not affected by GIL.

I don't know how to actually check it, but this is how I understand it after reading a couple of python blogs about GIL.

Upvotes: 2

Related Questions