JZee
JZee

Reputation: 43

Basic python multi-threading issue

New to python and trying to understand multi-threading. Here's an example from python documentation on Queue

For the heck of my life, I don't understand how this example is working. In the worker() function, there's an infinite loop. How does the worker know when to get out of the loop? There seems to be no breaking condition.

And what exactly is the join doing at the end? Shouldn't I be joining the threads instead?

def worker():
    while True:
        item = q.get()
        do_work(item)
        q.task_done()

q = Queue()
for i in range(num_worker_threads):
    t = Thread(target=worker)
    t.daemon = True
    t.start()

for item in source():
    q.put(item)

q.join()       # block until all tasks are done

Also another question, When should multithreading be used and when should multiprocessing be used?

Upvotes: 3

Views: 1771

Answers (3)

Titon
Titon

Reputation: 139

Agree with joel-cornett, mostly. I tried to run the following snippet in python2.7 :

from threading import Thread
from Queue import Queue

def worker():
    def do_work(item):
        print(item)

    while True:
        item = q.get()
        do_work(item)
        q.task_done()

q = Queue()
for i in range(4):
     t = Thread(target=worker)
     t.daemon = True
     t.start()

for item in range(10):
    q.put(item)

q.join()

The output is:

0
1
2
3
4
5
6
7
8
9
Exception in thread Thread-3 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
  File "/usr/lib/python2.7/threading.py", line 504, in run
  File "abc.py", line 9, in worker
  File "/usr/lib/python2.7/Queue.py", line 168, in get
  File "/usr/lib/python2.7/threading.py", line 236, in wait
<type 'exceptions.TypeError'>: 'NoneType' object is not callable

Most probable explanation i think:

As the queue gets empty after task exhaustion, parent thread quits, after returning from q.join() and destroys the queue. Child threads are terminated upon receiving the first TypeError exception produced in "item = q.get()", as the queue exists no more.

Upvotes: 0

Russell Borogove
Russell Borogove

Reputation: 19057

Regarding your second question, the biggest difference between threads and processes in Python is that the mainstream implementations use a global interpreter lock (GIL) to ensure that multiple threads can't mess up Python's internal data structures. This means that for programs that spend most of their time doing computation in pure Python, even with multiple CPUs you're not going to speed the program up much because only one thread at a time can hold the GIL. On the other hand, multiple threads can trivially share data in a Python program, and in some (but by no means all) cases, you don't have to worry too much about thread safety.

Where multithreading can speed up a Python program is when the program spends most of its time waiting on I/O -- disk access or, particularly these days, network operations. The GIL is not held while doing I/O, so many Python threads can run concurrently in I/O bound applications.

On the other hand, with multiprocessing, each process has its own GIL, so your performance can scale to the number of CPU cores you have available. The down side is that all communication between the processes will have to be done through a multiprocessing.Queue (which acts on the surface very like a Queue.Queue, but has very different underlying mechanics, since it has to communicate across process boundaries).

Since working through a thread safe or interprocess queue avoids a lot of potential threading problems, and since Python makes it so easy, the multiprocessing module is very attractive.

Upvotes: 5

Joel Cornett
Joel Cornett

Reputation: 24788

Yup. You're right. worker will run forever. However since Queue only has a finite number of items, eventually worker will permanently block at q.get() (Since there will be no more items in the queue). At this point, it's inconsequential that worker is still running. q.join() blocks until the Queue count drops to 0 (whenever the worker thread calls q.task_done, the count drops by 1). After that, the program ends. And the infinitely blocking thread dies with it's creator.

Upvotes: 6

Related Questions