Rajat Arora
Rajat Arora

Reputation: 606

python 3.7 async http requests to cosmos db

I am new to Python and I want to send async HTTP requests to cosmos DB using Python to perform the bulk insert operation. I have tried to use multithreading together with asyncio to achieve this task. It gives me already a good performance but I believe that it can be definitely improved more, here is the code:

        try:
            loop = asyncio.new_event_loop()
            return loop.run_until_complete(save(request.json))
        except ValidationException as e:
            return send_error(e)
    async def save(self, users):
        with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
            loop = asyncio.get_event_loop()
            futures = [
                loop.run_in_executor(
                    executor,
                    self.__save_to_cosmos,
                    user
                )
                for user in users
            ]
            result = await asyncio.gather(*futures)
        return result

Please note that "__save_to_cosmos" method is sending the HTTP request to Cosmos DB using the python SDK and its synchronous code since Cosmos DB SDK doesn't support async operations as far as I know.

Can anyone suggest if there is a better way to achieve this task?

Upvotes: 1

Views: 1139

Answers (1)

Jay Gong
Jay Gong

Reputation: 23782

Can anyone suggest if there is a better way to achieve this task?

Well,it's hard to answer such question perfectly.I tried to share some thoughts with you per my knowledge. According to the bulk executor document,Cosmos db only supports .net and java library. So,you need to encapsulate Python's bulk method by yourself.

Currently,you use asyncio package which uses an event loop that will handle what will be processed. Everything runs on a single process and a single thread.Based on this basis,i think you could use multiprocessing package to further improve efficiency.Please refer to the multiprocessing package document:

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.

Per my understanding,asyncio let your task run in event loop(single process) and multiprocessing let your asyncio tasks runs on multiple processes at the same time so that you could fully leverage the capacity of your machine.

Please know more about the differences between both of them from this link.And you could try to combine asyncio and multiprocessing,please refer to these threads:

1.What kind of problems (if any) would there be combining asyncio with multiprocessing?

2.github:https://github.com/dano/aioprocessing

Upvotes: 1

Related Questions