Yu Pan
Yu Pan

Reputation: 1

urllib2 urlopen will be block in multiprocessing

I want make use of multiprocessing to speed report generator for every company

following is test script:

from multiprocessing import Pool
import os, time, random, json, urllib, urllib2, uuid

def generate_report(url, cookie, company_id, period, remark):
    try:
        start = time.time()
        print('Run task %s (%s)... at: %s \n' % (company_id, os.getpid(), start))

        values = {
            'companies': json.dumps([company_id]),
            'month_year': period,
            'remark': remark
        }

        data = urllib.urlencode(values)

        headers = {
            'Cookie': cookie
        }
        url = "%s?pid=%s&uuid=%s" % (url, os.getpid(), uuid.uuid4().get_hex())
        request = urllib2.Request(url, data, headers)
        response = urllib2.urlopen(request)
        content = response.read()
        end = time.time()
        print 'Task %s runs %0.2f seconds, end at: %s \n' % (company_id, (end - start), end)
        return content
    except Exception as exc:
        return exc.message

if __name__=='__main__':
    print 'Parent process %s.\n' % os.getpid()
    p = Pool()

    url = 'http://localhost/fee_calculate/generate-single'
    cookie = 'xxx'
    company_ids = [17,15,21,19]
    period = '2017-08'
    remark = 'test add remark from python script'

    results = [p.apply_async(generate_report, args=(url,cookie,company_id,period,remark)) for company_id in company_ids]
    for r in results:
        print(r.get())

but I get the result as following:

Run task 17 (15952)... at: 1506568581.98
Run task 15 (17192)... at: 1506568581.99
Run task 21 (18116)... at: 1506568582.01
Run task 19 (1708)... at: 1506568582.05

Task 17 runs 13.50 seconds, end at: 1506568595.48

{"success":true,"info":"Successed!"}
Task 15 runs 23.60 seconds, end at: 1506568605.59

{"success":true,"info":"Successed!"}
Task 21 runs 34.35 seconds, end at: 1506568616.36

{"success":true,"info":"Successed!"}
Task 19 runs 44.38 seconds, end at: 1506568626.44

{"success":true,"info":"Successed!"}

it seems the urllib2.urlopen(request) has been blocked, the request not been sent parallelly, but orderly.

In order to test multiprocessing, the script fee_calculate/generate-single only has following important code:

sleep(10)

please give me advice, thanks.

PS: Platform: windows10, python2.7, 4 CPU

Upvotes: 0

Views: 332

Answers (1)

DBrowne
DBrowne

Reputation: 723

This isn't a multiprocessing issue. Multiprocessing is working as it should which you can see by observing that all of the tasks are starting at approximately the same time.

The task execution time is almost entirely dictated by the response time of your local endpoint at http://localhost/fee_calculate/generate-single. How are you running this server? If you observe the execution times for each of the reports you will notice that they are increasing in steps of ~10 seconds, which is your artificially imposed processing delay on the server side (sleep(10)).

I suspect that your local server is only single-threaded, and so can only handle one request at a time. This means that each request must be completed before the next one is processed, so when you make multiple concurrent requests like this you don't actually get any decrease in processing time.

Upvotes: 1

Related Questions