Reputation: 7144
I've got a piece of code:
for url in get_lines(file):
visit(url, timeout=timeout)
It gets URLs from file and visit it (by urllib2) in for loop.
Is is possible to do this in few threads? For example, 10 visits at the same time.
I've tried:
for url in get_lines(file):
Thread(target=visit, args=(url,), kwargs={"timeout": timeout}).start()
But it does not work - no effect, URLs are visited normally.
The simplified version of function visit:
def visit(url, proxy_addr=None, timeout=30):
(...)
request = urllib2.Request(url)
response = urllib2.urlopen(request)
return response.read()
Upvotes: 4
Views: 4610
Reputation: 151027
I suspect that you've run into the Global Interpreter Lock. Basically, threading
in python can't achieve concurrency, which seems to be your goal. You need to use multiprocessing
instead.
multiprocessing
is designed to have a roughly analogous interface to threading
, but it has a few quirks. Your visit
function as written above should work correctly, I believe, because it's written in a functional style, without side effects.
In multiprocessing
, the Process
class is the equivalent of the Thread
class in threading
. It has all the same methods, so it's a drop-in replacement in this case. (Though I suppose you could use pool
as JoeZuntz suggests -- but I would test with the basic Process
class first, to see if it fixes the problem.)
Upvotes: 1
Reputation: 1179
To expand on senderle's answer, you can use the Pool class in multiprocessing to do this easily:
from multiprocessing import Pool
pool = Pool(processes=5)
pages = pool.map(visit, get_lines(file))
When the map function returns then "pages" will be a list of the contents of the URLs. You can adjust the number of processes to whatever is suitable for your system.
Upvotes: 5