SMX
SMX

Reputation: 1442

When is Queue.join() necessary?

The Python 3 docs give an example of a worker thread that uses a queue (https://docs.python.org/3/library/queue.html):

def worker():
    while True:
        item = q.get()
        if item is None:
            break
        do_work(item)
        q.task_done()

q = queue.Queue()
threads = []
for i in range(num_worker_threads):
    t = threading.Thread(target=worker)
    t.start()
    threads.append(t)

for item in source():
    q.put(item)

# block until all tasks are done
q.join()

# stop workers
for i in range(num_worker_threads):
    q.put(None)
for t in threads:
    t.join()

In this example, why is q.join() necessary? Don't the subsequent q.put(None) and t.join() operations accomplish the same thing of blocking the main thread until the worker threads have completed?

Upvotes: 5

Views: 5520

Answers (1)

jarcobi889
jarcobi889

Reputation: 835

Here's how I'm understanding the example.

Each worker loops infinitely, always looking for something new from the Queue. If the item it gets is None, it breaks and returns control to main.

So, first we make the program wait for the Queue to be empty. Each call to q.task_done() marks a new item as complete. The code hangs on the following so we make sure every item is marked as done.

# block until all tasks are done
q.join()

Then, below, we add the same number of None items into the queue as there are workers (so we make sure each worker gets one.)

for i in range(num_worker_threads):
    q.put(None)

Next, we join all the threads. Since we gave every worker a None item through the Queue, they will all break. Until they all break and return control, we want to hang here.

for t in threads:
    t.join()

Doing it this way, we make sure that every item in the Queue is handled, every worker breaks when the Queue is empty, and each worker is shut down before we move on with our code, helping avoid orphan processes.

Upvotes: 6

Related Questions