Mateusz Jagiełło
Mateusz Jagiełło

Reputation: 7144

Python 2.5 - multi-threaded for loop

I've got a piece of code:

for url in get_lines(file):
    visit(url, timeout=timeout)

It gets URLs from file and visit it (by urllib2) in for loop.

Is is possible to do this in few threads? For example, 10 visits at the same time.


I've tried:

for url in get_lines(file):
    Thread(target=visit, args=(url,), kwargs={"timeout": timeout}).start()

But it does not work - no effect, URLs are visited normally.


The simplified version of function visit:

def visit(url, proxy_addr=None, timeout=30):
    (...)
    request = urllib2.Request(url)
    response = urllib2.urlopen(request)
    return response.read()

Upvotes: 4

Views: 4610

Answers (2)

senderle
senderle

Reputation: 151027

I suspect that you've run into the Global Interpreter Lock. Basically, threading in python can't achieve concurrency, which seems to be your goal. You need to use multiprocessing instead.

multiprocessing is designed to have a roughly analogous interface to threading, but it has a few quirks. Your visit function as written above should work correctly, I believe, because it's written in a functional style, without side effects.

In multiprocessing, the Process class is the equivalent of the Thread class in threading. It has all the same methods, so it's a drop-in replacement in this case. (Though I suppose you could use pool as JoeZuntz suggests -- but I would test with the basic Process class first, to see if it fixes the problem.)

Upvotes: 1

JoeZuntz
JoeZuntz

Reputation: 1179

To expand on senderle's answer, you can use the Pool class in multiprocessing to do this easily:

from multiprocessing import Pool
pool = Pool(processes=5)
pages = pool.map(visit, get_lines(file))

When the map function returns then "pages" will be a list of the contents of the URLs. You can adjust the number of processes to whatever is suitable for your system.

Upvotes: 5

Related Questions