Is this multi-threaded function asynchronous

I'm afraid I'm still a bit confused (despite checking other threads) whether:

all asynchronous code is multi-threaded
all multi-threaded functions are asynchronous

My initial guess is no to both and that proper asynchronous code should be able to run in one thread - however it can be improved by adding threads for example like so:

So I constructed this toy example:

from threading import *
from queue import Queue
import time

def do_something_with_io_lag(in_work):
    out = in_work
    # Imagine we do some work that involves sending
    # something over the internet and processing the output
    # once it arrives
    time.sleep(0.5) # simulate IO lag
    print("Hello, bee number: ",
          str(current_thread().name).replace("Thread-",""))

class WorkerBee(Thread):
    def __init__(self, q):
        Thread.__init__(self)
        self.q = q

    def run(self):
        while True:
            # Get some work from the queue
            work_todo = self.q.get()
            # This function will simiulate I/O lag
            do_something_with_io_lag(work_todo)
            # Remove task from the queue
            self.q.task_done()

if __name__ == '__main__':
    def time_me(nmbr):
        number_of_worker_bees = nmbr
        worktodo = ['some input for work'] * 50

        # Create a queue
        q = Queue()
        # Fill with work
        [q.put(onework) for onework in worktodo]
        # Launch processes
        for _ in range(number_of_worker_bees):
            t = WorkerBee(q)
            t.start()
        # Block until queue is empty
        q.join()

    # Run this code in serial mode (just one worker)
    %time time_me(nmbr=1)
    # Wall time: 25 s
    # Basically 50 requests * 0.5 seconds IO lag
    # For me everything gets processed by bee number: 59

    # Run this code using multi-tasking (launch 50 workers)
    %time time_me(nmbr=50)
    # Wall time: 507 ms
    # Basically the 0.5 second IO lag + 0.07 seconds it took to launch them
    # Now everything gets processed by different bees

Is it asynchronous?

To me this code does not seem asynchronous because it is Figure 3 in my example diagram. The I/O call blocks the thread (although we don't feel it because they are blocked in parallel).

However, if this is the case I am confused why requests-futures is considered asynchronous since it is a wrapper around ThreadPoolExecutor:

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()

Can this function on just one thread?

Especially when compared to asyncio, which means it can run single-threaded

There are only two ways to have a program on a single processor do “more than one thing at a time.” Multi-threaded programming is the simplest and most popular way to do it, but there is another very different technique, that lets you have nearly all the advantages of multi-threading, without actually using multiple threads. It’s really only practical if your program is largely I/O bound. If your program is processor bound, then pre-emptive scheduled threads are probably what you really need. Network servers are rarely processor bound, however.

Upvotes: 4

Answers (2)

Nikita

Reputation: 6341

First of all, one note: concurrent.futures.Future is not the same as asyncio.Future. Basically it's just an abstraction - an object, that allows you to refer to job result (or exception, which is also a result) in your program after you assigned a job, but before it is completed. It's similar to assigning common function's result to some variable.

Multithreading: Regarding your example, when using multiple threads you can say that your code is "asynchronous" as several operations are performed in different threads at the same time without waiting for each other to complete, and you can see it in the timing results. And you're right, your function due to sleep is blocking, it blocks the worker thread for the specified amount of time, but when you use several threads those threads are blocked in parallel. So if you would have one job with sleep and the other one without and run multiple threads, the one without sleep would perform calculations while the other would sleep. When you use single thread, the jobs are performed in in a serial manner one after the other, so when one job sleeps the other jobs wait for it, actually they just don't exist until it's their turn. All this is pretty much proven by your time tests. The thing happened with print has to do with "thread safety", i.e. print uses standard output, which is a single shared resource. So when your multiple threads tried to print at the same time the switching happened inside and you got your strange output. (This also show "asynchronicity" of your multithreaded example.) To prevent such errors there are locking mechanisms, e.g. locks, semaphores, etc.

Asyncio: To better understand the purpose note the "IO" part, it's not 'async computation', but 'async input/output'. When talking about asyncio you usually don't think about threads at first. Asyncio is about event loop and generators (coroutines). The event loop is the arbiter, that governs the execution of coroutines (and their callbacks), that were registered to the loop. Coroutines are implemented as generators, i.e. functions that allow to perform some actions iteratively, saving state at each iteration and 'returning', and on the next call continuing with the saved state. So basically the event loop is while True: loop, that calls all coroutines/generators, assigned to it, one after another, and they provide result or no-result on each such call - this provides possibility for "asynchronicity". (A simplification, as there's scheduling mechanisms, that optimize this behavior.) The event loop in this situation can run in single thread and if coroutines are non-blocking it will give you true "asynchronicity", but if they are blocking then it's basically a linear execution.

You can achieve the same thing with explicit multithreading, but threads are costly - they require memory to be assigned, switching them takes time, etc. On the other hand asyncio API allows you to abstract from actual implementation and just consider your jobs to be performed asynchronously. It's implementation may be different, it includes calling the OS API and the OS decides what to do, e.g. DMA, additional threads, some specific microcontroller use, etc. The thing is it works well for IO due to lower level mechanisms, hardware stuff. On the other hand, performing computation will require explicit breaking of computation algorithm into pieces to use as asyncio coroutine, so a separate thread might be a better decision, as you can launch the whole computation as one there. (I'm not talking about algorithms that are special to parallel computing). But asyncio event loop might be explicitly set to use separate threads for coroutines, so this will be asyncio with multithreading.

Regarding your example, if you'll implement your function with sleep as asyncio coroutine, shedule and run 50 of them single threaded, you'll get time similar to the first time test, i.e. around 25s, as it is blocking. If you will change it to something like yield from [asyncio.sleep][3](0.5) (which is a coroutine itself), shedule and run 50 of them single threaded, it will be called asynchronously. So while one coroutine will sleep the other will be started, and so on. The jobs will complete in time similar to your second multithreaded test, i.e. close to 0.5s. If you will add print here you'll get good output as it will be used by single thread in serial manner, but the output might be in different order then the order of coroutine assignment to the loop, as coroutines could be run in different order. If you will use multiple threads, then the result will obviously be close to the last one anyway.

Simplification: The difference in multythreading and asyncio is in blocking/non-blocking, so basicly blocking multithreading will somewhat come close to non-blocking asyncio, but there're a lot of differences.

Multithreading for computations (i.e. CPU bound code)
Asyncio for input/output (i.e. I/O bound code)

Regarding your original statement:

all asynchronous code is multi-threaded

all multi-threaded functions are asynchronous

I hope that I was able to show, that:

asynchronous code might be both single threaded and multi-threaded

all multi-threaded functions could be called "asynchronous"

Upvotes: 4

Ulrich Eckhardt

Reputation: 17444

I think the main confusion comes from the meaning of asynchronous. From the Free Online Dictionary of Computing, "A process [...] whose execution can proceed independently" is asynchronous. Now, apply that to what your bees do:

Retrieve an item from the queue. Only one at a time can do that, while the order in which they get an item is undefined. I wouldn't call that asynchronous.
Sleep. Each bee does so independently of all others, i.e. the sleep duration runs on all, otherwise the time wouldn't go down with multiple bees. I'd call that asynchronous.
Call print(). While the calls are independent, at some point the data is funneled into the same output target, and at that point a sequence is enforced. I wouldn't call that asynchronous. Note however that the two arguments to print() and also the trailing newline are handled independently, which is why they can be interleaved.
Lastly, the call to q.join(). Here of course the calling thread is blocked until the queue is empty, so some kind of synchronization is enforced and wanted. I don't see why this "seems to break" for you.

Upvotes: 0

Is this multi-threaded function asynchronous

Answers (2)

Related Questions