Update: The entire premise for this question just demonstrated my lack of understanding of the concept; insightful answers below - but the question is in it's entirety "just wrong". I am trying to teach myself about the Python async execution model. The following example program downloads five different web pages asyncronusously: #!/usr/bin/env python3 import requests import asyncio async def download(url): response = requests.get(url) print(f"Have downloaded {url}") async def async_main(): for url in ["https://www.aftenposten.no", "https://www.vg.no", "https://lwn.net", "https://www.dagbladet.no", "https://www.nrk.no"]: await download(url) loop = asyncio.get_event_loop() loop.run_until_complete(async_main()) ~it has roughly the expected speedup and works as expected - all good!~ However I am struggling to understand what happens on the await download(url) line. My layman understanding is that the following process takes place: The download(url) function is called - "in the background". The event loop pauses the current coroutine instance and starts the next. However - for this to work the download(url) call must be in "some execution context", i.e. my guess is that the async implementation is threaded internally? I.e. after some initial fencing the async implementation will invoke the download(url) in a separate execution context - i.e. thread? This is in some contrast to the documentation which states that the async concurrency model does not involve multiple threads/processes? Grateful for a clarification. Update : the speedup has been questioned both in comment answers. I have now redone the timing a bit more carefully, and see now that I was wrong - probably saw the result I wanted to see.... More careful timing indicates that the serial is slightly faster. Sorry about the confusion.

user422005

Reputation: 2041

asyncio - is threading involved in the implementation?

Update: The entire premise for this question just demonstrated my lack of understanding of the concept; insightful answers below - but the question is in it's entirety "just wrong".

I am trying to teach myself about the Python async execution model. The following example program downloads five different web pages asyncronusously:


#!/usr/bin/env python3
import requests
import asyncio


async def download(url):
    response = requests.get(url)
    print(f"Have downloaded {url}")



async def async_main():
    for url in ["https://www.aftenposten.no",
                "https://www.vg.no",
                "https://lwn.net",
                "https://www.dagbladet.no",
                "https://www.nrk.no"]:
        await download(url)

loop = asyncio.get_event_loop()
loop.run_until_complete(async_main())

~it has roughly the expected speedup and works as expected - all good!~ However I am struggling to understand what happens on the await download(url) line. My layman understanding is that the following process takes place:

The download(url) function is called - "in the background".
The event loop pauses the current coroutine instance and starts the next.

However - for this to work the download(url) call must be in "some execution context", i.e. my guess is that the async implementation is threaded internally? I.e. after some initial fencing the async implementation will invoke the download(url) in a separate execution context - i.e. thread? This is in some contrast to the documentation which states that the async concurrency model does not involve multiple threads/processes?

Grateful for a clarification.

Update: the speedup has been questioned both in comment answers. I have now redone the timing a bit more carefully, and see now that I was wrong - probably saw the result I wanted to see.... More careful timing indicates that the serial is slightly faster. Sorry about the confusion.

Upvotes: 1

Answers (3)

larsks

Reputation: 312630

There is no threading involved in asyncio. Fundamentally, asyncio coroutines are just fancy Python generators, wrapped in an event loop to handle scheduling.

Consider the following code:

import string
import time


def task1():
    for x in string.digits:
        yield x
        time.sleep(0.5)


def task2():
    for x in string.ascii_lowercase:
        yield x
        time.sleep(0.5)


def loop():
    tasks = [task1(), task2()]
    completed = set()

    while tasks:
        for t in tasks:
            try:
                print(f"task {t.__name__} says:", next(t))
            except StopIteration:
                completed.add(t)

        tasks = [t for t in tasks if t not in completed]


if __name__ == "__main__":
    loop()

Here, I have defined two generators (task1 and task2). In loop(), I "start" both tasks; because they use yield, calling the function returns an iterator, rather than executing the function code.

Now they are both running concurrently, though not in parallel -- much like asyncio coroutines. Each function can run as long as it wants until it calls yield, at which point control returns to the loop() function, which gets to decide which task executes next.

Running the above code produces output that looks like:

task task1 says: 0
task task2 says: a
task task1 says: 1
task task2 says: b
task task1 says: 2
task task2 says: c
task task1 says: 3
task task2 says: d
task task1 says: 4
task task2 says: e
task task1 says: 5
task task2 says: f
task task1 says: 6
task task2 says: g
task task1 says: 7
task task2 says: h
task task1 says: 8
task task2 says: i
task task1 says: 9
task task2 says: j
task task2 says: k
...

The article "From yield to async/await " seems to be a really great overview of the topic.

Upvotes: 1

Masklinn

Reputation: 42592

it has roughly the expected speedup and works as expected

That doesn't really make sense? There is no speedup in your program, it's completely sequential. If anything there's a small slowdown because it needs to setup an async event loop for nothing.

The download(url) function is called - "in the background".

No, the download(url) function is called in the foreground, but rather than actually call the function right then and there it creates a coroutine. await then "passes" that coroutine upwards until it reaches the event loop, which can run it.

The event loop pauses the current coroutine instance and starts the next.

Coroutines are cooperatives, so it's the exact opposite: the event loop runs a coroutine until that coroutine decides to stop.

At this point if the coroutine yields an awaitable the event loop registers the await-able internally in order to know when it is ready to progress, and runs (resumes) the next task.

However - for this to work the download(url) call must be in "some execution context", i.e. my guess is that the async implementation is threaded internally? I.e. after some initial fencing the async implementation will invoke the download(url) in a separate execution context - i.e. thread?

Your program doesn't work at all because requests has no async support (hence not being await-ed), it's completely blocking.

But an async-aware library would not normally use threading internally, instead it would use non-blocking IO primitives.

There are limited cases where the OS does not support or provide non-blocking IO for an IO task (network address resolution — gethostbyname, getaddrinfo is probably the most common one) in which case the runtime may maintain a pool of helper threads for that purpose, but that should not be the baseline. I don't think it's the case for Python's stdlib though.

This is in some contrast to the documentation which states that the async concurrency model does not involve multiple threads/processes?

No, the documentation is broadly correct.

Upvotes: 2

Frank C.

Reputation: 8096

The documentation is correct.

It involves control passing, it does not signal, to the download(url). Think of it more of a subroutine that while it runs nothing else is running until the download(url) relinquishes control or completes.

Upvotes: 1

asyncio - is threading involved in the implementation?

Answers (3)

Related Questions