Unexpected Behavior from Python Multiprocessing Pool Class

Question

I am trying to utilize Python's multiprocessing library to quickly run a function using the 8 processing cores I have on a Linux VM I created. As a test, I am getting the time in seconds it takes for a worker pool with 4 processes to run a function, and the time it takes running the same function without using a worker pool. The time in seconds is coming out as about the same, in some case it is taking the worker pool much longer to process than without.

Script

import requests
import datetime
import multiprocessing as mp

shared_results = []

def stress_test_url(url):
    print('Starting Stress Test')
    count = 0

    while count <= 200:
        response = requests.get(url)
        shared_results.append(response.status_code)
        count += 1

pool = mp.Pool(processes=4)

now = datetime.datetime.now()
results = pool.apply(stress_test_url, args=(url,))
diff = (datetime.datetime.now() - now).total_seconds()

now = datetime.datetime.now()
results = stress_test_url(url)
diff2 = (datetime.datetime.now() - now).total_seconds()

print(diff)
print(diff2)

Terminal Output

Starting Stress Test
Starting Stress Test
44.316212
41.874116

noxdafox · Accepted Answer

The apply function of multiprocessing.Pool simply runs a function in a separate process and waits for its results. It takes a little bit more than running sequentially as it needs to pack the job to be processed and ship it to the child process via a pipe.

multiprocessing doesn't make sequential operations faster, it simply allows them to be run in parallel if you hardware has more than one core.

Just try this:

urls = ["http://google.com", 
        "http://example.com", 
        "http://stackoverflow.com", 
        "http://python.org"]

results = pool.map(stress_test_url, urls)

You will see that the 4 URLs get visited seemingly at the same time. This means your logic reduces the amount of time necessary to visit N websites to N / processes.

Lastly, benchmarking a function which performs an HTTP request is a very poor way to measure performance as networks are unreliable. You will hardly get two executions which take the same amount of time no matter whether you use multiprocessing or not.

Unexpected Behavior from Python Multiprocessing Pool Class

Answers (1)

Related Questions