etskinner
etskinner

Reputation: 148

How do I use requests_html to asynchronously get() a list of URLs?

I'm trying to asynchronously get() a list of URLs using python package resuqests_html, similar to the async example in the README using Python 3.6.5 and requests_html 0.10.0.

My understanding is that AsyncHTMLSession.run() is supposed work very much the same as asyncio.gather(): You give it a bunch of awaitables, and it runs all of them. Is that incorrect?

Here's the code I'm trying, which I expect should get the pages and store the responses:

from requests_html import AsyncHTMLSession

async def get_link(url):
    r = await asession.get(url)
    return r

asession = AsyncHTMLSession()
results = asession.run(get_link("http://google.com"), get_link("http://yahoo.com"))

But I'm getting this exception instead:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    results = asession.run(get_link("google.com"), get_link("yahoo.com"))
  File ".\venv\lib\site-packages\requests_html.py", line 772, in run
    asyncio.ensure_future(coro()) for coro in coros
  File ".\venv\lib\site-packages\requests_html.py", line 772, in <listcomp>
    asyncio.ensure_future(coro()) for coro in coros
TypeError: 'coroutine' object is not callable
sys:1: RuntimeWarning: coroutine 'get_link' was never awaited

Am I doing something wrong?

Upvotes: 1

Views: 3720

Answers (1)

user4815162342
user4815162342

Reputation: 154916

Am I doing something wrong?

You are not calling asession.run correctly.

asyncio.gather accepts awaitable objects, such as coroutine objects obtained by just calling a coroutine (async) function. asession.run, on the other hand, accepts callables, such as async functions, which it will invoke to produce awaitables. The difference is like between one function that accepts an iterable, and which you could pass e.g. an instantiated generator, and another that accepts a callable that will return an iterable, and which you could pass a generator function itself.

Since your async functions have arguments, you cannot just pass get_link to asession.run; you must use functools.partial or a lambda itself:

results = asession.run(
    lambda: get_link("http://google.com"),
    lambda: get_link("http://yahoo.com"),
)

Upvotes: 5

Related Questions