pythonmultithreadingasynchronousfastapigil

Reputation: 43

FastAPI - Why does synchronous code do not block the event Loop?

I’ve been digging into FastAPI’s handling of synchronous and asynchronous endpoints, and I’ve come across a few things that I’m trying to understand more clearly, especially with regards to how blocking operations behave in Python.

From what I understand, when a synchronous route (defined with def) is called, FastAPI offloads it to a separate thread from the thread pool to avoid blocking the main event loop. This makes sense, as the thread can be blocked (e.g., time.sleep()), but the event loop itself doesn’t get blocked because it continues handling other requests.

But here’s my confusion: If the function is truly blocking (e.g., it’s waiting for something like time.sleep()), how is the event loop still able to execute other tasks concurrently? Isn’t the Python interpreter supposed to execute just one thread at a time?

Here an example:

from fastapi import APIRouter
import os
import threading
import asyncio

app = APIRouter()

@app.get('/sync')
def tarefa_sincrona():
    print('Sync')
    total = 0
    for i in range(10223424*1043):
        total += i
    print('Sync task done')

@app.get('/async')
async def tarefa_sincrona():
    print('Async task')
    await asyncio.sleep(5)
    print('Async task done')

If I make two requests — the first one to the sync endpoint and the second one to the async endpoint — almost at the same time, I expected the event loop to be blocked. However, in reality, what happens is that the two requests are executed "in parallel."

Upvotes: 3

Answers (3)

Vegard

Reputation: 4927

If the function is truly blocking (e.g., it’s waiting for something like time.sleep()), how is the event loop still able to execute other tasks concurrently? Isn’t the Python interpreter supposed to execute just one thread at a time?

Only one thread is indeed executed at a time. The flaw in the quoted question is to assume that time.sleep() keeps the thread active - as another answerer has pointed out, it does not.

The TL;DR is that time.sleep() does block the thread, but it contains a C macro that periodically releases its lock on the global interpreter.

Concurrency in Python (with GIL)

A thread can acquire a lock on the global interpreter, but only if the interpreter isn't already locked
A lock cannot be forcibly removed, it has to be released by the thread that has it
CPython will periodically release the running thread's GIL if there are other threads waiting for execution time
Functions can also voluntarily release their locks

Voluntarily releasing locks is pretty common. In C-extensions, it's practically mandatory:

Py_BEGIN_ALLOW_THREADS is a macro for { PyThreadState *_save; _save = PyEval_SaveThread();
PyEval_SaveThread() releases GIL.

time.sleep() voluntarily releases the lock on the global interpreter with the macro mentioned above.

Synchronous threading:

As mentioned earlier, Python will regularly try to release the GIL so that other threads can get a bit of execution time.

For threads with a varied workload, this is smart. If a thread is waiting for I/O but the code doesn't voluntarily release GIL, this method will still result in the GIL being swapped to a new thread.

For threads that are entirely or primarily CPU-bound, it works... but it doesn't speed up execution. I'll include code that proves this at the end of the post.

The reason it doesn't provide a speed-up in this case is that CPU-bound operations aren't waiting on anything, so sleeping func_1 to give execution time to func_2 just means that func_1 is idle for no reason - with the result that func_1's potential completion time gets staggered by the amount of execution time is granted to func_2.

Inside of an event loop:

asyncio's event loop is single-threaded, which is to say that it doesn't spawn new threads. Each coroutine that runs, uses the main thread (the same thread the event loop lives in). The way this works is that the event loop and its coroutines work together to pass the GIL among themselves.

But why aren't coroutines offloaded to threads, so that CPython can step in and release the GIL to to other threads?

Many reasons, but the easiest to grasp is maybe this: In practice that would have meant running the risk of significantly lagging the event loop. Because instead of immediately resuming its own tasks (which is to spawn a new coroutine) when the current coroutine finishes, it now possibly has to wait for execution time due to the GIL having been passed off elsewhere. Similarly, coroutines would take longer to finish due to constant context-switching.

Which is a long-winded way of saying that if time.sleep() didn't release its lock, or if you were running a long CPU-bound thing, a single thread would indeed block the entire event loop (by hogging the GIL).

So what now?

Inside of GIL-bound Python, whether it's sync or async, the only way to execute CPU-binding code (that doesn't actively release its lock) with true concurrency is at the process-level, so either multiprocessing or concurrent.futures.ProcessPoolExecutor, as each process will have its own GIL.

So:

async functions running CPU-bound code (with no voluntary yields) will run to completion before yielding GIL.

sync functions in separate threads running CPU-bound code with no voluntary yields will get paused periodically, and the GIL gets passed off elsewhere.

(For clarity:) sync functions in the same thread will have no concurrency whatsoever.

multiprocessing docs also hint very clearly at the above descriptions:

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.

As well as threading docs:

threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously

Reading between the lines, this is much the same as saying that tasks bound by anything other than I/O won't achieve any noteworthy concurrency through threading.

Testing it yourself:

# main.py

from fastapi import FastAPI
import time
import os
import threading

app = FastAPI()

def bind_cpu(id: int):
    thread_id = threading.get_ident()

    print(f"{time.perf_counter():.4f}:   BIND GIL for ID: {id}, internals: PID({os.getpid()}), thread({thread_id})")

    start = time.perf_counter()
    total = 0
    for i in range(100_000_000):
        total += i

    end = time.perf_counter()
    print(f"{time.perf_counter():.4f}:   REL  GIL for ID: {id}, internals: PID({os.getpid()}), thread({thread_id}). Duration: {end-start:.4f}s")

    return total

def endpoint_handler(method: str, id: int):
    print(f"{time.perf_counter():.4f}: Worker reads {method} endpoint with ID: {id} - internals: PID({os.getpid()}), thread({threading.get_ident()})")
    result = bind_cpu(id)
    print(f"{time.perf_counter():.4f}: Worker finished ID: {id} - internals: PID({os.getpid()}), thread({threading.get_ident()})")
    return f"ID: {id}, {result}"


@app.get("/async/{id}")
async def async_endpoint_that_gets_blocked(id: int):
    return endpoint_handler("async", id)

@app.get("/sync/{id}")
def sync_endpoint_that_gets_blocked(id: int):
    return endpoint_handler("sync", id)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True, workers=1)

# test.py

import asyncio
import httpx
import time

async def send_requests():
    async with httpx.AsyncClient(timeout=httpx.Timeout(25.0)) as client:
        tasks = []
        for i in range(1, 5):
            print(f"{time.perf_counter():.4f}: Sending HTTP request for id: {i}")
            if i % 2 == 0:
                tasks.append(client.get(f"http://localhost:8000/async/{i}"))
            else:
                tasks.append(client.get(f"http://localhost:8000/sync/{i}"))
        responses = await asyncio.gather(*tasks)
        for response in responses:
            print(f"{time.perf_counter():.4f}: {response.text}")

asyncio.run(send_requests())

Launch FastAPI (python main.py)
Fire off some requests (python test.py)

You will get results looking something like this:

[...]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

10755.6897: Sending HTTP request for id: 1
10755.6900: Sending HTTP request for id: 2
10755.6902: Sending HTTP request for id: 3
10755.6904: Sending HTTP request for id: 4

10755.9722: Worker reads async endpoint with ID: 4 - internals: PID(24492), thread(8972)
10755.9725:   BIND GIL for ID: 4, internals: PID(24492), thread(8972)
10759.4551:   REL  GIL for ID: 4, internals: PID(24492), thread(8972). Duration: 3.4823s
10759.4554: Worker finished ID: 4 - internals: PID(24492), thread(8972)
INFO:     127.0.0.1:56883 - "GET /async/4 HTTP/1.1" 200 OK

10759.4566: Worker reads async endpoint with ID: 2 - internals: PID(24492), thread(8972)
10759.4568:   BIND GIL for ID: 2, internals: PID(24492), thread(8972)
10762.6428:   REL  GIL for ID: 2, internals: PID(24492), thread(8972). Duration: 3.1857s
10762.6431: Worker finished ID: 2 - internals: PID(24492), thread(8972)
INFO:     127.0.0.1:56884 - "GET /async/2 HTTP/1.1" 200 OK

10762.6446: Worker reads sync endpoint with ID: 3 - internals: PID(24492), thread(22648)
10762.6448:   BIND GIL for ID: 3, internals: PID(24492), thread(22648)
10762.6968: Worker reads sync endpoint with ID: 1 - internals: PID(24492), thread(9144)
10762.7127:   BIND GIL for ID: 1, internals: PID(24492), thread(9144)
10768.9234:   REL  GIL for ID: 3, internals: PID(24492), thread(22648). Duration: 6.2784s
10768.9338: Worker finished ID: 3 - internals: PID(24492), thread(22648)
INFO:     127.0.0.1:56882 - "GET /sync/3 HTTP/1.1" 200 OK
10769.2121:   REL  GIL for ID: 1, internals: PID(24492), thread(9144). Duration: 6.4835s
10769.2124: Worker finished ID: 1 - internals: PID(24492), thread(9144)
INFO:     127.0.0.1:56885 - "GET /sync/1 HTTP/1.1" 200 OK

10769.2138: "ID: 1, 4999999950000000"
10769.2141: "ID: 2, 4999999950000000"
10769.2143: "ID: 3, 4999999950000000"
10769.2145: "ID: 4, 4999999950000000"

Interpretation

Going over the timestamps and the durations, two things are immediately clear:

The async endpoints are executing de-facto synchronously
The sync endpoints are executing concurrently and finish nearly at the same time BUT each request takes twice as long to complete compared to the async ones

Both of these results are expected, re: the explanations earlier.

The async endpoints become de-facto synchronous because the function we built hoards the GIL, and so the event loop gets no execution time until the coroutine returns.

The sync endpoints become faux-asynchronous because Python's context manager is swapping between them every ~5ms, which means that the first request increments by x%, then the second request increments by x% - repeat until both finish ~ish at the same time.

Upvotes: 4

Frank Yellin

Reputation: 11297

If you go to the implementation of thread.sleep in the C code, you will see that a portion of the code is wrapped in Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS.

Normally, only one thread can run at once. But these two macros specially set up a section of code in which other threads are allowed to grab the Global Interpreter Lock.

Upvotes: 2

Onuralp Arslan

Reputation: 360

time.sleep() block the current process but it doesnt completly render the interpreter useless since it need to measure the time. So it keeps working.

Think it like a person looking his clock and waiting. The person is capable to do other things and keeps breathing for example but their main foucs it to wait for sometime. Maybe waiting for their meal to cook.

In your scenerio where you use asynchronous, python interpreter just pauses one task and looks at other. So it is not completly usesless. Think it like a round-robin. Works for one process for limited cpu clock time (waiting for the time sleep in this example) then pauses it and looks at other process. "the function is truly blocking" doesnt mean it renders interpreter to unable to do anything other but it just tells it to wait for something.

So our person in example does some other task like loading the dishes in dishwasher and for every 4 dish placed they check their clock to see if their meal is ready. So cooking the meal is a blocking process for preapering dinner since you need to wait for it to be cooked. But you can asyncly load the dishes and check for the time to see if meal is ready.

Upvotes: 2

FastAPI - Why does synchronous code do not block the event Loop?

Answers (3)

Related Questions