Reputation: 3307
Need: Python 3.7 or later.
Two functions main1
and main2
defined below. One create tasks and then await all of them at the end; Another create and await each at a time.
While main1
take 2 seconds and main2
takes 30 seconds. Why?
import asyncio
async def say_after(delay, what):
await asyncio.sleep(delay)
print(what)
async def main1():
tasks = []
for _ in range(10):
task1 = asyncio.create_task(say_after(1, 'hello'))
task2 = asyncio.create_task(say_after(2, 'world'))
tasks.append(task1)
tasks.append(task2)
for x in tasks:
await x
async def main2():
for _ in range(10):
await asyncio.create_task(say_after(1, 'hello'))
await asyncio.create_task(say_after(2, 'world'))
asyncio.run(main2())
EDIT 1:
Here is a main3
version, which take 20 seconds. I'd say the whole thing is just out of intuition :(
async def main3():
for _ in range(10):
task1 = asyncio.create_task(say_after(1, 'hello'))
task2 = asyncio.create_task(say_after(2, 'world'))
await task1
await task2
EDIT 2:
(With some more sample code added below) I've read detailed answers from @freakish, I still stuck at one point: So only continuous await
will corporately work in parallel (main4)?
Since create_task()
takes no time (right?), why not both two await
in main5
run in background so that main5
would took max time of (task1, task2)?
Is this await
mechanism by design, or just a asyncio
limitation (in design or in implementation)?
And any await
detailed behaviors defined in official Python docs?
# took 2 seconds
async def main4():
task1 = asyncio.create_task(say_after(1, 'hello'))
task2 = asyncio.create_task(say_after(2, 'world'))
await task1
await task2
# took 3 seconds
async def main5():
task1 = asyncio.create_task(say_after(1, 'hello'))
await task1
task2 = asyncio.create_task(say_after(2, 'world'))
await task2
Upvotes: 2
Views: 601
Reputation: 56467
Because main1
creates all tasks at the same time and then awaits all of them after they are created. Everything happens in parallel. And so total time is maximum of all times which is 2s.
While main2
creates a new task only after previous one finishes. Everything happens sequentially. So total time is sum of all times which (judging from code) should be 30s.
Edit: say you have 3 tasks: task1, task2, task3
. If you do
then the total execution time is obviously task1.time + task2.time + task3.time
because there is no background processing. The flow is sequential. Now lets say you do
Now task1, task2, task3
run in background. So it takes T1 = task1.time
to process 4. But at pt 5 it takes T2 = max(task2.time - T1, 0)
to process it beceause it already worked in background for T1
time. At pt 6 it takes T3 = max(task3.time - T2 - T1, 0)
to process it because it already worked in background for T1+T2
time. Now some maths is required to calculate that the sum of T1+T2+T3=max(task1.time, task2.time, task3.time)
.
But the intuition is this: if taskX
was the longest one and it finished then everything else finished due to parallel processing. So await
returns immediatly making the total processing time maximum of all times.
Side note: there are nuanses: this only works when you actually do parallelizable stuff like asyncio.sleep()
. If those tasks are synchronous (say some cpu calculations) then both cases will give 30s.
Edit2: So your main3
has a bit different flow. It lets two tasks to run in parallel. But no more:
So this time task1
and task2
happen in parallel. But only after they are done, task3
and task4
can run. In parallel. So for each group the total time is maximum but you have to sum separate groups. I.e. the total execution time is max(task1.time, task2.time)+max(task3.time, task4.time)
which in your case is
max(1,2) + ... + max(1,2) [10 times] = 20
Upvotes: 4