Python 2.5 - multi-threaded for loop

Question

I've got a piece of code:

for url in get_lines(file):
    visit(url, timeout=timeout)

It gets URLs from file and visit it (by urllib2) in for loop.

Is is possible to do this in few threads? For example, 10 visits at the same time.

I've tried:

for url in get_lines(file):
    Thread(target=visit, args=(url,), kwargs={"timeout": timeout}).start()

But it does not work - no effect, URLs are visited normally.

The simplified version of function visit:

def visit(url, proxy_addr=None, timeout=30):
    (...)
    request = urllib2.Request(url)
    response = urllib2.urlopen(request)
    return response.read()

JoeZuntz · Accepted Answer

To expand on senderle's answer, you can use the Pool class in multiprocessing to do this easily:

from multiprocessing import Pool
pool = Pool(processes=5)
pages = pool.map(visit, get_lines(file))

When the map function returns then "pages" will be a list of the contents of the URLs. You can adjust the number of processes to whatever is suitable for your system.

Python 2.5 - multi-threaded for loop

Answers (2)

Related Questions