How to deal with Request Timed Out error in Python when executing a query against AzureCosmosDB for MongoDB?

Question

I have a dataset sitting in Azure CosmosDB for MongoDB. It is about 3.5 million records. I have a Python script that looks for duplicates in the dataset. It is basically an aggregation pipeline, here is the snippet of the code:

 pipeline = [
        {"$group": {
            "_id": f"${id}",
            "count": {"$sum": 1},
            "ids": {"$push": "$_id"}
        }},
        {"$match": {"count": {"$gt": 1}}}
    ]

    max_time_ms = 60000

    duplicates = list(collection.aggregate(pipeline))

It is never able to finish the execution, comes back with an error saying "Request timed out. Retries due to rate limiting: True., full error: {'ok': 0.0, 'errmsg': 'Request timed out. Retries due to rate limiting: True.', 'code': 50, 'codeName': 'ExceededTimeLimit'}". I have tried running the script with the below line as well but it came back with the same error:

duplicates = list(collection.aggregate(pipeline), maxTimeMS=max_time_ms, allowDiskUse=True))

Is there anything else I can do?

How to deal with Request Timed Out error in Python when executing a query against AzureCosmosDB for MongoDB?

Answers (1)

Related Questions