mitnk
mitnk

Reputation: 3307

Python asyncio Await Tasks

Need: Python 3.7 or later.

Two functions main1 and main2 defined below. One create tasks and then await all of them at the end; Another create and await each at a time.

While main1 take 2 seconds and main2 takes 30 seconds. Why?

import asyncio

async def say_after(delay, what):
    await asyncio.sleep(delay)
    print(what)

async def main1():
    tasks = []
    for _ in range(10):
        task1 = asyncio.create_task(say_after(1, 'hello'))
        task2 = asyncio.create_task(say_after(2, 'world'))
        tasks.append(task1)
        tasks.append(task2)
    for x in tasks:
        await x

async def main2():
    for _ in range(10):
        await asyncio.create_task(say_after(1, 'hello'))
        await asyncio.create_task(say_after(2, 'world'))

asyncio.run(main2())

EDIT 1:

Here is a main3 version, which take 20 seconds. I'd say the whole thing is just out of intuition :(

async def main3():
    for _ in range(10):
        task1 = asyncio.create_task(say_after(1, 'hello'))
        task2 = asyncio.create_task(say_after(2, 'world'))
        await task1
        await task2

EDIT 2:

(With some more sample code added below) I've read detailed answers from @freakish, I still stuck at one point: So only continuous await will corporately work in parallel (main4)?

Since create_task() takes no time (right?), why not both two await in main5 run in background so that main5 would took max time of (task1, task2)?

Is this await mechanism by design, or just a asyncio limitation (in design or in implementation)?

And any await detailed behaviors defined in official Python docs?

# took 2 seconds
async def main4():
    task1 = asyncio.create_task(say_after(1, 'hello'))
    task2 = asyncio.create_task(say_after(2, 'world'))
    await task1
    await task2

# took 3 seconds
async def main5():
    task1 = asyncio.create_task(say_after(1, 'hello'))
    await task1
    task2 = asyncio.create_task(say_after(2, 'world'))
    await task2

Upvotes: 2

Views: 601

Answers (1)

freakish
freakish

Reputation: 56467

Because main1 creates all tasks at the same time and then awaits all of them after they are created. Everything happens in parallel. And so total time is maximum of all times which is 2s.

While main2 creates a new task only after previous one finishes. Everything happens sequentially. So total time is sum of all times which (judging from code) should be 30s.

Edit: say you have 3 tasks: task1, task2, task3. If you do

  1. create task1
  2. await task1
  3. create task2
  4. await task2
  5. create task3
  6. await task3

then the total execution time is obviously task1.time + task2.time + task3.time because there is no background processing. The flow is sequential. Now lets say you do

  1. create task1
  2. create task2
  3. create task3
  4. await task1
  5. await task2
  6. await task3

Now task1, task2, task3 run in background. So it takes T1 = task1.time to process 4. But at pt 5 it takes T2 = max(task2.time - T1, 0) to process it beceause it already worked in background for T1 time. At pt 6 it takes T3 = max(task3.time - T2 - T1, 0) to process it because it already worked in background for T1+T2 time. Now some maths is required to calculate that the sum of T1+T2+T3=max(task1.time, task2.time, task3.time).

But the intuition is this: if taskX was the longest one and it finished then everything else finished due to parallel processing. So await returns immediatly making the total processing time maximum of all times.

Side note: there are nuanses: this only works when you actually do parallelizable stuff like asyncio.sleep(). If those tasks are synchronous (say some cpu calculations) then both cases will give 30s.

Edit2: So your main3 has a bit different flow. It lets two tasks to run in parallel. But no more:

  1. create task1
  2. create task2
  3. await task1
  4. await task2
  5. create task3
  6. create task4
  7. await task3
  8. await task4

So this time task1 and task2 happen in parallel. But only after they are done, task3 and task4 can run. In parallel. So for each group the total time is maximum but you have to sum separate groups. I.e. the total execution time is max(task1.time, task2.time)+max(task3.time, task4.time) which in your case is

max(1,2) + ... + max(1,2) [10 times] = 20

Upvotes: 4

Related Questions