Reputation: 460
for this code:
import sys
import gevent
from gevent import monkey
monkey.patch_all()
import requests
import urllib2
def worker(url, use_urllib2=False):
if use_urllib2:
content = urllib2.urlopen(url).read().lower()
else:
content = requests.get(url, prefetch=True).content.lower()
title = content.split('<title>')[1].split('</title>')[0].strip()
urls = ['http://www.mail.ru']*5
def by_requests():
jobs = [gevent.spawn(worker, url) for url in urls]
gevent.joinall(jobs)
def by_urllib2():
jobs = [gevent.spawn(worker, url, True) for url in urls]
gevent.joinall(jobs)
if __name__=='__main__':
from timeit import Timer
t = Timer(stmt="by_requests()", setup="from __main__ import by_requests")
print 'by requests: %s seconds'%t.timeit(number=3)
t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2")
print 'by urllib2: %s seconds'%t.timeit(number=3)
sys.exit(0)
this result:
by requests: 18.3397213892 seconds
by urllib2: 2.48605842363 seconds
in sniffer it looks this:
description: first 5 requests are sended by requests library, next 5 requests are sended by urllib2 library.
red - is time when work was freeze, dark - when data receiving... wtf?!
How it posible if socket library patched and libraries must work identically? How use requests without requests.async for asynchronious work?
Upvotes: 19
Views: 25429
Reputation: 1520
From the requests doc Blocking Or Non-Blocking:
If you are concerned about the use of blocking IO, there are lots of projects out there that combine Requests with one of Python's asynchronicity frameworks. Two excellent examples are grequests and requests-futures.
Upvotes: 2
Reputation: 460
Sorry Kenneth Reitz. His library is wonderful.
I am stupid. I need select monkey patch for httplib like this:
gevent.monkey.patch_all(httplib=True)
Because patch for httplib is disabled by default.
Upvotes: 15
Reputation: 3325
As was pointed out by Kenneth, another thing we can do is to let the requests
module handle the asynchronous part. I've made changes to your code accordingly. Again, for me, the results show consistently that requests
module performs better than urllib2
Doing this means that we cannot "thread" the call back part. But that should be okay, because the major gain should only be expected with the HTTP requests due to the request/response delay.
import sys
import gevent
from gevent import monkey
monkey.patch_all()
import requests
from requests import async
import urllib2
def call_back(resp):
content = resp.content
title = content.split('<title>')[1].split('</title>')[0].strip()
return title
def worker(url, use_urllib2=False):
if use_urllib2:
content = urllib2.urlopen(url).read().lower()
title = content.split('<title>')[1].split('</title>')[0].strip()
else:
rs = [async.get(u) for u in url]
resps = async.map(rs)
for resp in resps:
call_back(resp)
urls = ['http://www.mail.ru']*5
def by_requests():
worker(urls)
def by_urllib2():
jobs = [gevent.spawn(worker, url, True) for url in urls]
gevent.joinall(jobs)
if __name__=='__main__':
from timeit import Timer
t = Timer(stmt="by_requests()", setup="from __main__ import by_requests")
print 'by requests: %s seconds'%t.timeit(number=3)
t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2")
print 'by urllib2: %s seconds'%t.timeit(number=3)
sys.exit(0)
Here's one of my results:
by requests: 2.44117593765 seconds
by urllib2: 4.41298294067 seconds
Upvotes: 7
Reputation: 8846
Requests has gevent support integrated into the codebase:
http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests
Upvotes: 6
Reputation: 3325
I ran your code on my machine (python 2.7.1
, gevent 0.13.0
, requests 0.10.6
). It turned out that the time was always a good second or two faster when using the requests module . What versions are you using? An upgrade might simply solve the issue for you.
by requests: 3.7847161293 seconds
by urllib2: 4.92611193657 seconds
by requests: 2.90777993202 seconds
by urllib2: 7.99798607826 seconds
Upvotes: 2