shx2
shx2

Reputation: 64328

Profiling the GIL

Is there a way to profile a python process' usage of the GIL? Basically, I want to find out what percentage of the time the GIL is held. The process is single-threaded.

My motivation is that I have some code written in Cython, which uses nogil. Ideally, I would like to run it in a multi-threaded process, but in order to know if that can potentially be a good idea, I need to know if the GIL is free a significant amount of the time.


I found this related question, from 8 years ago. The sole answer there is "No". Hopefully, things have changed since then.

Upvotes: 6

Views: 2183

Answers (3)

thatsafunnyname
thatsafunnyname

Reputation: 176

If you are wondering how many times the GIL is taken, you can use gdb breakpoints. For example:

> cat gil_count_example.py
import sys
import threading
from threading import Thread

def worker():
    k=0
    for j in range(10000000):
        k+=j
    return

num_threads = int(sys.argv[1])
threads = []
for i in range(num_threads):
    t = Thread(target = worker)
    t.start()
    threads.append(t)

for t in threads:
    t.join()

For 3.X break on take_gil

> cgdb --args python3 gil_count_example.py 8
(gdb) b take_gil
(gdb) ignore 1 100000000
(gdb) r
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007ffff7c85f10 in take_gil
                                                   at Python-3.4.3/Python/ceval_gil.h:208
        breakpoint already hit 1886 times

For 2.X break on PyThread_acquire_lock

> cgdb --args python2 gil_count_example.py 8
(gdb) b PyThread_acquire_lock
(gdb) ignore 1 100000000
(gdb) r
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
  1       breakpoint     keep y   0x00000039bacfd410 
        breakpoint already hit 1584561 times

An efficient poor man's profiler can also be used to profile the wall time spent in functions, I use https://github.com/knielsen/knielsen-pmp

 > ./get_stacktrace --max=100 --freq=10 `/sbin/pidof python2`
 ...
 292  71.92% sem_wait:PyThread_acquire_lock

.

> ./get_stacktrace --max=100 --freq=10 `/sbin/pidof python3`
...
557  77.68%  pthread_cond_timedwait:take_gil

Upvotes: 0

shx2
shx2

Reputation: 64328

Completely by accident, I found a tool which does just this: gil_load.

It was actually published after I posted the question.

Well done, @chrisjbillington.

>>> import sys, math
>>> import gil_load
>>> gil_load.init()
>>> gil_load.start(output = sys.stdout)
>>> for x in range(1, 1000000000):
...     y = math.log(x**math.pi)
[2017-03-15 08:52:26]  GIL load: 0.98 (0.98, 0.98, 0.98)
[2017-03-15 08:52:32]  GIL load: 0.99 (0.99, 0.99, 0.99)
[2017-03-15 08:52:37]  GIL load: 0.99 (0.99, 0.99, 0.99)
[2017-03-15 08:52:43]  GIL load: 0.99 (0.99, 0.99, 0.99)
[2017-03-15 08:52:48]  GIL load: 1.00 (1.00, 1.00, 1.00)
[2017-03-15 08:52:52]  GIL load: 1.00 (1.00, 1.00, 1.00)
<...>

>>> import sys, math
>>> import gil_load
>>> gil_load.init()
>>> gil_load.start(output = sys.stdout)
>>> for x in range(1, 1000000000):
...     with open('/dev/null', 'a') as f:
...         print(math.log(x**math.pi), file=f)

[2017-03-15 08:53:59]  GIL load: 0.76 (0.76, 0.76, 0.76)
[2017-03-15 08:54:03]  GIL load: 0.77 (0.77, 0.77, 0.77)
[2017-03-15 08:54:09]  GIL load: 0.78 (0.78, 0.78, 0.78)
[2017-03-15 08:54:13]  GIL load: 0.80 (0.80, 0.80, 0.80)
[2017-03-15 08:54:19]  GIL load: 0.81 (0.81, 0.81, 0.81)
[2017-03-15 08:54:23]  GIL load: 0.81 (0.81, 0.81, 0.81)
[2017-03-15 08:54:28]  GIL load: 0.81 (0.81, 0.81, 0.81)
[2017-03-15 08:54:33]  GIL load: 0.80 (0.80, 0.80, 0.80)
<...>

Upvotes: 8

Heikki Toivonen
Heikki Toivonen

Reputation: 31150

I don't know of such a tool.

But there are some heuristics that can help you guess whether or not going multithreaded would help. As you probably know, the GIL will be released during IO operations, and some calls into native code, especially by 3rd party native modules. If you don't have much code like that, then multithreading is likely not going to help you.

If you do have IO/native code, then you'd probably have to just try it out. Depending on the code base converting the whole thing to take advantage of multiple threads might be a lot of work, so you might want to instead try to apply multithreading to parts where you know IO/native code is getting called, and measuring to see if you get any improvements.

Depending on your use case, multiprocessing could work for cases that are primarily CPU bound. Multiprocessing does add overhead, so it is typically good approach for CPU bound tasks that last a relatively long time (several seconds or longer).

Upvotes: 0

Related Questions