aseylys
aseylys

Reputation: 344

Mutliprocessing Queue vs. Pool

I'm having the hardest time trying to figure out the difference in usage between multiprocessing.Pool and multiprocessing.Queue.

To help, this is bit of code is a barebones example of what I'm trying to do.

def update():
    def _hold(url):
        soup = BeautifulSoup(url)
        return soup
    def _queue(url):
        soup = BeautifulSoup(url)
        li = [l for l in soup.find('li')]
        return True if li else False

    url = 'www.ur_url_here.org'
    _hold(url)
    _queue(url)

I'm trying to run _hold() and _queue() at the same time. I'm not trying to have them communicate with each other so there is no need for a Pipe. update() is called every 5 seconds.

I can't really rap my head around the difference between creating a pool of workers, or creating a queue of functions. Can anyone assist me?

The real _hold() and _queue() functions are much more elaborate than the example so concurrent execution actually is necessary, I just thought this example would suffice for asking the question.

Upvotes: 13

Views: 7161

Answers (1)

noxdafox
noxdafox

Reputation: 15020

The Pool and the Queue belong to two different levels of abstraction.

The Pool of Workers is a concurrent design paradigm which aims to abstract a lot of logic you would otherwise need to implement yourself when using processes and queues.

The multiprocessing.Pool actually uses a Queue internally for operating.

If your problem is simple enough, you can easily rely on a Pool. In more complex cases, you might need to deal with processes and queues yourself.

For your specific example, the following code should do the trick.

def hold(url):
    ...
    return soup

def queue(url):
    ...
    return bool(li)

def update(url):
    with multiprocessing.Pool(2) as pool:
        hold_job = pool.apply_async(hold, args=[url])
        queue_job = pool.apply_async(queue, args=[url])

        # block until hold_job is done
        soup = hold_job.get()
        # block until queue_job is done
        li = queue_job.get()

I'd also recommend you to take a look at the concurrent.futures module. As the name suggest, that is the future proof implementation for the Pool of Workers paradigm in Python.

You can easily re-write the example above with that library as what really changes is just the API names.

Upvotes: 17

Related Questions