Reputation: 3483
Lets say I have a function:
from time import sleep
def doSomethingThatTakesALongTime(number):
print number
sleep(10)
and then I call it in a for loop
for number in range(10):
doSomethingThatTakesALongTime(number)
How can I set this up so that it only takes 10 seconds TOTAL to print out:
$ 0123456789
Instead of taking 100 seconds. If it helps, I'm going to use the information YOU provide to do asynchronous web scraping. i.e. I have a list of sites I want to visit, but I want to visit them simultaneously, rather than wait for each one to complete.
Upvotes: 1
Views: 1312
Reputation: 9521
Just in case, this is the exact way to apply green threads to your example snippet:
from eventlet.green.time import sleep
from eventlet.greenpool import GreenPool
def doSomethingThatTakesALongTime(number):
print number
sleep(10)
pool = GreenPool()
for number in range(100):
pool.spawn_n(doSomethingThatTakesALongTime, number)
import timeit
print timeit.timeit("pool.waitall()", "from __main__ import pool")
# yields : 10.9335260363
Upvotes: 0
Reputation: 86
asyncoro supports asynchronous, concurrent programming. It includes asynchronous (non-blocking) socket implementation. If your implementation does not need urllib/httplib etc. (that don't have asynchronous completions), it may fit your purpose (and easy to use, as it is very similar to programming with threads). Your above problem with asyncoro:
import asyncoro
def do_something(number, coro=None):
print number
yield coro.sleep(10)
for number in range(10):
asyncoro.Coro(do_something, number)
Upvotes: 2
Reputation: 2049
Try to use Eventlet — the first example of documentation shows how to implement simultaneous URL fetching:
urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
"https://wiki.secondlife.com/w/images/secondlife.jpg",
"http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]
import eventlet
from eventlet.green import urllib2
def fetch(url):
return urllib2.urlopen(url).read()
pool = eventlet.GreenPool()
for body in pool.imap(fetch, urls):
print "got body", len(body)
I can also advise to look toward Celery for more flexible solution.
Upvotes: 2
Reputation: 2733
Take a look at scrapy framework. It's intended specially for web scraping and is very good. It is asynchronus and built on twisted framework.
Upvotes: 1