Reputation: 9826
I want to do parallel http request tasks in asyncio
, but I find that python-requests
would block the event loop of asyncio
. I've found aiohttp but it couldn't provide the service of http request using a http proxy.
So I want to know if there's a way to do asynchronous http requests with the help of asyncio
.
Upvotes: 219
Views: 232746
Reputation: 67
Requests does not support asyncio. You can use aiohttp instead, since aiohttp fully supports asyncio and has better performance than requests.
Alternatively, you can use requests with traditional multithreading:
import concurrent.futures
import requests
def main():
with concurrent.futures.ThreadPoolExecutor() as executor:
feature1 = executor.submit(requests.get, 'http://www.google.com')
feature2 = executor.submit(requests.get, 'http://www.google.co.uk')
print(feature1.result().text)
print(feature2.result().text)
main()
You can use loop.run_in_executor
to integrate executor
into asyncio. The above code is semantically equivalent to:
import asyncio
import requests
@asyncio.coroutine
def main():
loop = asyncio.get_event_loop()
future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
response1 = yield from future1
response2 = yield from future2
print(response1.text)
print(response2.text)
asyncio.run(main())
With this approach, you can use any other blocking library with asyncio.
With Python 3.5+ you can use the new await
/async
syntax:
import asyncio
import requests
async def main():
loop = asyncio.get_event_loop()
future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
print((await future1).text)
print((await future2).text)
asyncio.run(main())
See PEP 492 for more.
With Python 3.9+, it's even simpler using asyncio.to_thread
:
import asyncio
import requests
async def main():
future1 = asyncio.to_thread(requests.get, 'http://www.google.com')
future2 = asyncio.to_thread(requests.get, 'http://www.google.co.uk')
print((await future1).text)
print((await future2).text)
asyncio.run(main())
asyncio.to_thread
has another advantage: asyncio.to_thread
accepts keyword arguments, while loop.run_in_executor
doesn't.
Keep in mind that all of the above code actually uses multithreading under the hood instead of asyncio, so consider using an asynchronous HTTP client such as aiohttp to achieve true asynchrony.
Upvotes: 0
Reputation: 1755
aiohttp can be used with HTTP proxy already:
import asyncio
import aiohttp
async def do_request():
proxy_url = 'http://localhost:8118' # your proxy address
response = await aiohttp.request(
'GET', 'http://google.com',
proxy=proxy_url,
)
return response
loop = asyncio.get_event_loop()
loop.run_until_complete(do_request())
Upvotes: 115
Reputation: 2653
To use requests (or any other blocking libraries) with asyncio, you can use BaseEventLoop.run_in_executor to run a function in another thread and yield from it to get the result. For example:
import asyncio
import requests
@asyncio.coroutine
def main():
loop = asyncio.get_event_loop()
future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
response1 = yield from future1
response2 = yield from future2
print(response1.text)
print(response2.text)
asyncio.run(main())
This will get both responses in parallel.
With python 3.5 you can use the new await
/async
syntax:
import asyncio
import requests
async def main():
loop = asyncio.get_event_loop()
response1 = await loop.run_in_executor(None, requests.get, 'http://www.google.com')
response2 = await loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
print(response1.text)
print(response2.text)
asyncio.run(main())
See PEP0492 for more.
Upvotes: 246
Reputation: 51
python-requests
does not natively support asyncio yet. Going with a library that natively supports asyncio, like httpx would be the most beneficial approach.
However, if your use cases heavily relies on using python-requests
you can wrap the sync calls with asyncio.to_thread
and asyncio.gather
and follow the asyncio programming patterns.
import asyncio
import requests
async def main():
res = await asyncio.gather(asyncio.to_thread(requests.get("YOUR_URL"),)
if __name__ == "__main__":
asyncio.run(main())
For concurrency/parallelization of the network requests:
import asyncio
import requests
urls = ["URL_1", "URL_2"]
async def make_request(url: string):
response = await asyncio.gather(asyncio.to_thread(requests.get(url),)
return response
async def main():
responses = await asyncio.gather((make_request(url) for url in urls))
for response in responses:
print(response)
if __name__ == "__main__":
asyncio.run(main())
Upvotes: 3
Reputation: 10401
The answers above are still using the old Python 3.4 style coroutines. Here is what you would write if you got Python 3.5+.
aiohttp
supports http proxy now
import aiohttp
import asyncio
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
urls = [
'http://python.org',
'https://google.com',
'http://yifei.me'
]
tasks = []
async with aiohttp.ClientSession() as session:
for url in urls:
tasks.append(fetch(session, url))
htmls = await asyncio.gather(*tasks)
for html in htmls:
print(html[:100])
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
There is also the httpx
library, which is a drop-in replacement for requests with async/await
support. However, httpx is somewhat slower than aiohttp.
Another option is curl_cffi
, which has the ability to impersonate browsers' ja3 and http2 fingerprints.
Upvotes: 95
Reputation: 1648
Considering that aiohttp is fully featured web framework, I’d suggest to use something more light weighted like httpx (https://www.python-httpx.org/) which supports async requests. It has almost identical api to requests:
>>> async with httpx.AsyncClient() as client:
... r = await client.get('https://www.example.com/')
...
>>> r
<Response [200 OK]>
Upvotes: 10
Reputation: 111
DISCLAMER: Following code creates different threads for each function.
This might be useful for some of the cases as it is simpler to use. But know that it is not async but gives illusion of async using multiple threads, even though decorator suggests that.
To make any function non blocking, simply copy the decorator and decorate any function with a callback function as parameter. The callback function will receive the data returned from the function.
import asyncio
import requests
def run_async(callback):
def inner(func):
def wrapper(*args, **kwargs):
def __exec():
out = func(*args, **kwargs)
callback(out)
return out
return asyncio.get_event_loop().run_in_executor(None, __exec)
return wrapper
return inner
def _callback(*args):
print(args)
# Must provide a callback function, callback func will be executed after the func completes execution !!
@run_async(_callback)
def get(url):
return requests.get(url)
get("https://google.com")
print("Non blocking code ran !!")
Upvotes: 2
Reputation: 1204
There is a good case of async/await loops and threading in an article by Pimin Konstantin Kefaloukos Easy parallel HTTP requests with Python and asyncio:
To minimize the total completion time, we could increase the size of the thread pool to match the number of requests we have to make. Luckily, this is easy to do as we will see next. The code listing below is an example of how to make twenty asynchronous HTTP requests with a thread pool of twenty worker threads:
# Example 3: asynchronous requests with larger thread pool
import asyncio
import concurrent.futures
import requests
async def main():
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
loop = asyncio.get_event_loop()
futures = [
loop.run_in_executor(
executor,
requests.get,
'http://example.org/'
)
for i in range(20)
]
for response in await asyncio.gather(*futures):
pass
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Upvotes: 12
Reputation: 15558
Requests does not currently support asyncio
and there are no plans to provide such support. It's likely that you could implement a custom "Transport Adapter" (as discussed here) that knows how to use asyncio
.
If I find myself with some time it's something I might actually look into, but I can't promise anything.
Upvotes: 14