Reputation: 7056
I was wondering if anyone has tried using [gevent][1] and [socksipy][2] for concurrent downloads.
Upvotes: 1
Views: 292
Reputation: 7180
I've used gevent for downloading ~12k pictures from yfrog, instagram, twitpic, etc. The cumulated size of the pictures was around 1.5Gb, and it took ~20 minutes to download them all, on my home wifi.
To do so, I implemented an image_download
function which sole purpose was to download a picture from a given URL, and then asynchronously mapped an URLs list on the image_download
function, using a gevent.Pool
.
from gevent import monkey
monkey.patch_socket() # See http://www.gevent.org/gevent.monkey.html
import gevent
NB_WORKERS = 50
def image_download(url):
# retrieve image
def parallel_image_download(urls): # urls is of type list
""" Activate NB_WORKERS Greenlets to asynchronously download the images. """
pool = gevent.Pool(NB_WORKERS)
return pool.map(image_download, urls)
NB: I settled on 50 parallel workers after a couple of tries. Passed 50, the total runtime did not increase.
Upvotes: 3