Reputation: 758
It's well known that asyncio is designed to speed up server ,enhance it's ability to carry up more requests as a web server. However according to my test today, I shockedly found that for the puropse of switching between tasks ,using Thread is much more faster than using coroutine (eventhough under a thread lock as guarantee). Is that means it meaningless using coroutine?
Wondering why ,could anyone please help me figure out?
Here's my testting code : add a global variable 2 000 000 times in two tasks by turns.
from threading import Thread , Lock
import time , asyncio
def thread_speed_test():
def add1():
nonlocal count
for i in range(single_test_num):
mutex.acquire()
count += 1
mutex.release()
mutex = Lock()
count = 0
thread_list = list()
for i in range(thread_num):
thread_list.append(Thread(target = add1))
st_time = time.time()
for thr in thread_list:
thr.start()
for thr in thread_list:
thr.join()
ed_time = time.time()
print("runtime" , count)
print(f'threading finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s' ,end='\n\n')
def asyncio_speed_test():
count = 0
@asyncio.coroutine
def switch():
yield
async def add1():
nonlocal count
for i in range(single_test_num):
count += 1
await switch()
async def main():
tasks = asyncio.gather( *(add1() for i in range(thread_num))
)
st_time = time.time()
await tasks
ed_time = time.time()
print("runtime" , count)
print(f'asyncio finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s')
asyncio.run(main())
if __name__ == "__main__":
single_test_num = 1000000
thread_num = 2
thread_speed_test()
asyncio_speed_test()
got the following result in my pc:
2000000
threading finished in 0.9332s ,speed 2143159.1985q/s
2000000
asyncio finished in 16.044s ,speed 124657.3379q/s
append:
I realized that when thread number increase , threading mode goes slower but async mode goes faster. here's my test results:
# asyncio #
thread_num numbers of switching in 1sec average time of a single switch(ns)
2 122296 8176
32 243502 4106
128 252571 3959
512 253258 3948
4096 239334 4178
# threading #
thread_num numbers of switching in 1sec average time of a single switch(ns)
2 2278386 438
4 737829 1350
8 393786 2539
16 367123 2720
32 369260 2708
64 381061 2624
512 381403 2622
Upvotes: 2
Views: 2260
Reputation: 10926
To make a more fair comparison, I changed your code slightly.
I replaced your simple Lock with a Condition. This allowed me to force a thread switch after each iteration of the counter. The Condition.wait() function call always blocks the thread where the call is made; the thread continues only when another thread calls Condition.notify(). Therefore a thread switch must occur.
This is not the case with your test. A task switch will only occur when the thread scheduler causes one, since the logic of your code never causes a thread to block. The Lock.release() function does not block the caller, unlike Condition.wait().
There is one small difficulty: the last running thread will block forever when it calls Condition.wait() for the last time. That is why I introduced a simple counter to keep track of how many running threads are left. Also, when a thread is finished with its loop it has to make one final call to Condition.notify() in order to release the next thread.
The only change I made to your async test is to replace the "yield" statement with await asyncio.sleep(0). This was for compatibility with Python 3.8. I also reduced the number of trials by a factor of 10.
Timings were on a fairly old Win10 machine with Python 3.8.
As you can see, the threading code is quite a bit slower. That's what I would expect. One of the reasons to have async/await is because it's more lightweight than the threading mechanism.
from threading import Thread , Condition
import time , asyncio
def thread_speed_test():
def add1():
nonlocal count
nonlocal thread_count
for i in range(single_test_num):
with mutex:
mutex.notify()
count += 1
if thread_count > 1:
mutex.wait()
thread_count -= 1
with mutex:
mutex.notify()
mutex = Condition()
count = 0
thread_count = thread_num
thread_list = list()
for i in range(thread_num):
thread_list.append(Thread(target = add1))
st_time = time.time()
for thr in thread_list:
thr.start()
for thr in thread_list:
thr.join()
ed_time = time.time()
print("runtime" , count)
print(f'threading finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s' ,end='\n\n')
def asyncio_speed_test():
count = 0
async def switch():
await asyncio.sleep(0)
async def add1():
nonlocal count
for i in range(single_test_num):
count += 1
await switch()
async def main():
tasks = asyncio.gather(*(add1() for i in range(thread_num)) )
st_time = time.time()
await tasks
ed_time = time.time()
print("runtime" , count)
print(f'asyncio finished in {round(ed_time - st_time,4)}s ,speed {round(single_test_num * thread_num / (ed_time - st_time),4)}q/s')
asyncio.run(main())
if __name__ == "__main__":
single_test_num = 100000
thread_num = 2
thread_speed_test()
asyncio_speed_test()
runtime 200000
threading finished in 4.0335s ,speed 49584.7548q/s
runtime 200000
asyncio finished in 1.7519s ,speed 114160.9466q/s
Upvotes: 1
Reputation: 7121
I am not sure, you might be comparing apples to oranges.
You are basically punishing async, sort of forcing it to switch contexts, which takes time, while the threads are allowed to run freely.
asyncio is thought for tasks that have to wait for input for some time. This is not the case in your benchmark.
For a fair comparison you should simulate some realistic delay.
Upvotes: 0