user1050619
user1050619

Reputation: 20856

Python multithreading using queue

I was reading a article on Python multi threading using Queues and have a basic question.

Based on the print stmt, 5 threads are started as expected. So, how does the queue works?

1.The thread is started initially and when the queue is populated with a item does it gets restarted and starts processing that item? 2.If we use the queue system and threads process each item by item in the queue, how there is a improvement in performance..Is it not similar to serial processing ie; 1 by 1.

import Queue
import threading
import urllib2
import datetime
import time

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

queue = Queue.Queue()

class ThreadUrl(threading.Thread):

  def __init__(self, queue):
    threading.Thread.__init__(self)
    print 'threads are created'
    self.queue = queue

  def run(self):
    while True:
      #grabs host from queue
      print 'thread startting to run'
      now = datetime.datetime.now()

      host = self.queue.get()

      #grabs urls of hosts and prints first 1024 bytes of page
      url = urllib2.urlopen(host)
      print 'host=%s ,threadname=%s' % (host,self.getName())
      print url.read(20)

      #signals to queue job is done
      self.queue.task_done()

start = time.time()
if __name__ == '__main__':

  #spawn a pool of threads, and pass them queue instance 
    print 'program start'
    for i in range(5):

        t = ThreadUrl(queue)
        t.setDaemon(True)
        t.start()

 #populate queue with data   
    for host in hosts:
        queue.put(host)

 #wait on the queue until everything has been processed     
    queue.join()


    print "Elapsed Time: %s" % (time.time() - start)

Upvotes: 1

Views: 407

Answers (1)

jdi
jdi

Reputation: 92559

A queue is similar to a list container, but with internal locking to make it a thread-safe way to communicate data.

What happens when you start all of your threads is that they all block on the self.queue.get() call, waiting to pull an item from the queue. When an item is put into the queue from your main thread, one of the threads will become unblocked and receive the item. It can then continue to process it until it finishes and returns to a blocking state.

All of your threads can run concurrently because they all are able to receive items from the queue. This is where you would see your improvement in performance. If the urlopen and read take time in one thread and it is waiting on IO, that means another thread can do work. The queue objects job is simply to manage the locking access, and popping off items to the callers.

Upvotes: 1

Related Questions