Reputation: 606
I am new to Python and I want to send async HTTP requests to cosmos DB using Python to perform the bulk insert operation. I have tried to use multithreading together with asyncio to achieve this task. It gives me already a good performance but I believe that it can be definitely improved more, here is the code:
try:
loop = asyncio.new_event_loop()
return loop.run_until_complete(save(request.json))
except ValidationException as e:
return send_error(e)
async def save(self, users):
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
loop = asyncio.get_event_loop()
futures = [
loop.run_in_executor(
executor,
self.__save_to_cosmos,
user
)
for user in users
]
result = await asyncio.gather(*futures)
return result
Please note that "__save_to_cosmos" method is sending the HTTP request to Cosmos DB using the python SDK and its synchronous code since Cosmos DB SDK doesn't support async operations as far as I know.
Can anyone suggest if there is a better way to achieve this task?
Upvotes: 1
Views: 1139
Reputation: 23782
Can anyone suggest if there is a better way to achieve this task?
Well,it's hard to answer such question perfectly.I tried to share some thoughts with you per my knowledge. According to the bulk executor document,Cosmos db only supports .net
and java
library. So,you need to encapsulate Python's bulk method by yourself.
Currently,you use asyncio
package which uses an event loop that will handle what will be processed. Everything runs on a single process and a single thread.Based on this basis,i think you could use multiprocessing
package to further improve efficiency.Please refer to the multiprocessing
package document:
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.
Per my understanding,asyncio let your task run in event loop(single process) and multiprocessing let your asyncio tasks runs on multiple processes at the same time so that you could fully leverage the capacity of your machine.
Please know more about the differences between both of them from this link.And you could try to combine asyncio
and multiprocessing
,please refer to these threads:
1.What kind of problems (if any) would there be combining asyncio with multiprocessing?
2.github:https://github.com/dano/aioprocessing
Upvotes: 1