Reputation: 43
New to python and trying to understand multi-threading. Here's an example from python documentation on Queue
For the heck of my life, I don't understand how this example is working. In the worker() function, there's an infinite loop. How does the worker know when to get out of the loop? There seems to be no breaking condition.
And what exactly is the join doing at the end? Shouldn't I be joining the threads instead?
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
Also another question, When should multithreading be used and when should multiprocessing be used?
Upvotes: 3
Views: 1771
Reputation: 139
Agree with joel-cornett, mostly. I tried to run the following snippet in python2.7 :
from threading import Thread
from Queue import Queue
def worker():
def do_work(item):
print(item)
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(4):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in range(10):
q.put(item)
q.join()
The output is:
0
1
2
3
4
5
6
7
8
9
Exception in thread Thread-3 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 504, in run
File "abc.py", line 9, in worker
File "/usr/lib/python2.7/Queue.py", line 168, in get
File "/usr/lib/python2.7/threading.py", line 236, in wait
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
Most probable explanation i think:
As the queue gets empty after task exhaustion, parent thread quits, after returning from q.join() and destroys the queue. Child threads are terminated upon receiving the first TypeError exception produced in "item = q.get()", as the queue exists no more.
Upvotes: 0
Reputation: 19057
Regarding your second question, the biggest difference between threads and processes in Python is that the mainstream implementations use a global interpreter lock (GIL) to ensure that multiple threads can't mess up Python's internal data structures. This means that for programs that spend most of their time doing computation in pure Python, even with multiple CPUs you're not going to speed the program up much because only one thread at a time can hold the GIL. On the other hand, multiple threads can trivially share data in a Python program, and in some (but by no means all) cases, you don't have to worry too much about thread safety.
Where multithreading can speed up a Python program is when the program spends most of its time waiting on I/O -- disk access or, particularly these days, network operations. The GIL is not held while doing I/O, so many Python threads can run concurrently in I/O bound applications.
On the other hand, with multiprocessing, each process has its own GIL, so your performance can scale to the number of CPU cores you have available. The down side is that all communication between the processes will have to be done through a multiprocessing.Queue (which acts on the surface very like a Queue.Queue, but has very different underlying mechanics, since it has to communicate across process boundaries).
Since working through a thread safe or interprocess queue avoids a lot of potential threading problems, and since Python makes it so easy, the multiprocessing
module is very attractive.
Upvotes: 5
Reputation: 24788
Yup. You're right. worker
will run forever. However since Queue only has a finite number of items, eventually worker
will permanently block at q.get()
(Since there will be no more items in the queue). At this point, it's inconsequential that worker
is still running. q.join()
blocks until the Queue count drops to 0 (whenever the worker thread calls q.task_done
, the count drops by 1). After that, the program ends. And the infinitely blocking thread dies with it's creator.
Upvotes: 6