garyHuang
garyHuang

Reputation: 3

aiohttp request get block async/await- Python

I'm a newbie at asyncio and aiohttp. Recently, I try to practice for understanding how does the eventloop actually working.

when I practice for sending urls simultaneously, I encounter some problems. According to my knowledge, create_task will make the coro get into the eventloop and await will make the eventloop jump out to do other task until the await task is done, but the following result is out of my mind. The upside in blockmain works like sync(block mode) and the downside just work as my expect(It's works like what I've known with both async/await and asyncio). I'm not really sure whether I get misunderstanding for the knowledge of async/await and asyncio in this situation or not. If someone who really know about it, give me the detailed answer please. It really bother me.

Sorry for my poor English.

Following is my code

urls = [
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=1&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=2&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=3&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=4&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=5&jobsource=2018indexpoc&ro=0',
'http://www.httpbin.org:12345/',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=6&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=7&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=8&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=9&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=10&jobsource=2018indexpoc&ro=0']

async def fetch_(link):
    # loop = asyncio.get_event_loop()
    # print(asyncio.all_tasks(loop))
    async with ClientSession(timeout=ClientTimeout(total=10)) as session:
        async with session.get(link) as response:
            html_body = await response.text()
            print(f"{link} is done")

async def blockmain():
    # ========================= following 2 lines can't work as my expect
    for link in urls:
        await asyncio.create_task(fetch_(link))
    
    # second part
    # ========================= following 3 line can work as my expect
    # loop 1
    tasks = [asyncio.create_task(fetch_(link)) for link in urls]
    for t in tasks:
        await t
    # loop 2
    tasks = [asyncio.create_task(fetch_(link)) for link in urls]
    for t in tasks:
        await t

asyncio.run(blockmain())

I want to know the reason why the program will run like sync(block mode) when I await asyncio.create_task in the for loop, but work async that await task after create all tasks.

Thanks.

Upvotes: 0

Views: 997

Answers (1)

Paul Cornelius
Paul Cornelius

Reputation: 11009

In the first case you are not running the tasks concurrently.

for link in urls:
    await asyncio.create_task(fetch_(link))

The expression asyncio.create_task schedules the routine fetch_ as a task. The await keyword suspends the current task (blockmain) and waits for the fetch_ task to complete. Those are the only two tasks at that point. When the fetch_ task finishes, the main task continues. It goes through the loop again with a new value for link. That process repeats. You never have two tasks fetch_ running at the same time, since you await each task as you create it. There is no useful concurrent execution.

In the second case you get concurrent execution, since you create all the tasks before you await for the first time. The instances of fetch_ take turns, switching from one task to another each time one of the tasks needs to await something.

However, the code for your second case is longer than it needs to be. See the documentation for the asyncio.gather function. You could replace all three lines with one line, like this:

await asyncio.gather(fetch_(link) for link in urls)

The gather function automatically creates tasks and awaits until they are all finished.

Upvotes: 0

Related Questions