AdamHommer
AdamHommer

Reputation: 758

Why is asyncio switches between tasks way much slower than threading.Thread?

It's well known that asyncio is designed to speed up server ,enhance it's ability to carry up more requests as a web server. However according to my test today, I shockedly found that for the puropse of switching between tasks ,using Thread is much more faster than using coroutine (eventhough under a thread lock as guarantee). Is that means it meaningless using coroutine?

Wondering why ,could anyone please help me figure out?

Here's my testting code : add a global variable 2 000 000 times in two tasks by turns.

from threading import Thread , Lock
import time , asyncio

def thread_speed_test():

    def add1():
        nonlocal count
        for i in range(single_test_num):
            mutex.acquire()
            count += 1
            mutex.release()

    mutex = Lock()
    count = 0
    thread_list = list()
    for i in range(thread_num):
        thread_list.append(Thread(target = add1))

    st_time = time.time()
    for thr in thread_list:
        thr.start()

    for thr in thread_list:
        thr.join()

    ed_time = time.time()
    print("runtime" , count)
    print(f'threading finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s' ,end='\n\n')

def asyncio_speed_test():

    count = 0

    @asyncio.coroutine
    def switch():
        yield

    async def add1():
        nonlocal count
        for i in range(single_test_num):
            count += 1
            await switch()

    async def main():

        tasks = asyncio.gather(     *(add1() for i in range(thread_num))
                        )
        st_time = time.time()
        await tasks
        ed_time = time.time()
        print("runtime" , count)
        print(f'asyncio   finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s')

    asyncio.run(main())

if __name__ == "__main__":
    single_test_num = 1000000
    thread_num = 2
    thread_speed_test()
    asyncio_speed_test()

got the following result in my pc:

2000000
threading finished in 0.9332s ,speed 2143159.1985q/s

2000000
asyncio   finished in 16.044s ,speed 124657.3379q/s

append:

I realized that when thread number increase , threading mode goes slower but async mode goes faster. here's my test results:

# asyncio #
thread_num        numbers of switching in 1sec     average time of a single switch(ns)
         2                              122296                                    8176
        32                              243502                                    4106
       128                              252571                                    3959
       512                              253258                                    3948 
      4096                              239334                                    4178

# threading #
thread_num        numbers of switching in 1sec     average time of a single switch(ns)
         2                             2278386                                     438
         4                              737829                                    1350
         8                              393786                                    2539
        16                              367123                                    2720
        32                              369260                                    2708
        64                              381061                                    2624
       512                              381403                                    2622

Upvotes: 2

Views: 2260

Answers (2)

Paul Cornelius
Paul Cornelius

Reputation: 10926

To make a more fair comparison, I changed your code slightly.

I replaced your simple Lock with a Condition. This allowed me to force a thread switch after each iteration of the counter. The Condition.wait() function call always blocks the thread where the call is made; the thread continues only when another thread calls Condition.notify(). Therefore a thread switch must occur.

This is not the case with your test. A task switch will only occur when the thread scheduler causes one, since the logic of your code never causes a thread to block. The Lock.release() function does not block the caller, unlike Condition.wait().

There is one small difficulty: the last running thread will block forever when it calls Condition.wait() for the last time. That is why I introduced a simple counter to keep track of how many running threads are left. Also, when a thread is finished with its loop it has to make one final call to Condition.notify() in order to release the next thread.

The only change I made to your async test is to replace the "yield" statement with await asyncio.sleep(0). This was for compatibility with Python 3.8. I also reduced the number of trials by a factor of 10.

Timings were on a fairly old Win10 machine with Python 3.8.

As you can see, the threading code is quite a bit slower. That's what I would expect. One of the reasons to have async/await is because it's more lightweight than the threading mechanism.

from threading import Thread , Condition
import time , asyncio

def thread_speed_test():

    def add1():
        nonlocal count
        nonlocal thread_count
        for i in range(single_test_num):
            with mutex:
                mutex.notify()
                count += 1
                if thread_count > 1:
                    mutex.wait()
        thread_count -= 1
        with mutex:
            mutex.notify()

    mutex = Condition()
    count = 0
    thread_count = thread_num
    thread_list = list()
    for i in range(thread_num):
        thread_list.append(Thread(target = add1))

    st_time = time.time()
    for thr in thread_list:
        thr.start()

    for thr in thread_list:
        thr.join()

    ed_time = time.time()
    print("runtime" , count)
    print(f'threading finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s' ,end='\n\n')

def asyncio_speed_test():

    count = 0

    async def switch():
        await asyncio.sleep(0)

    async def add1():
        nonlocal count
        for i in range(single_test_num):
            count += 1
            await switch()

    async def main():

        tasks = asyncio.gather(*(add1() for i in range(thread_num))                        )
        st_time = time.time()
        await tasks
        ed_time = time.time()
        print("runtime" , count)
        print(f'asyncio   finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s')

    asyncio.run(main())

if __name__ == "__main__":
    single_test_num = 100000
    thread_num = 2
    thread_speed_test()
    asyncio_speed_test()

runtime 200000
threading finished in 4.0335s ,speed 49584.7548q/s

runtime 200000
asyncio   finished in 1.7519s ,speed 114160.9466q/s

Upvotes: 1

Joe
Joe

Reputation: 7121

I am not sure, you might be comparing apples to oranges.

You are basically punishing async, sort of forcing it to switch contexts, which takes time, while the threads are allowed to run freely.

asyncio is thought for tasks that have to wait for input for some time. This is not the case in your benchmark.

For a fair comparison you should simulate some realistic delay.

Upvotes: 0

Related Questions