Andy
Andy

Reputation: 21

How to deal with Request Timed Out error in Python when executing a query against AzureCosmosDB for MongoDB?

I have a dataset sitting in Azure CosmosDB for MongoDB. It is about 3.5 million records. I have a Python script that looks for duplicates in the dataset. It is basically an aggregation pipeline, here is the snippet of the code:

 pipeline = [
        {"$group": {
            "_id": f"${id}",
            "count": {"$sum": 1},
            "ids": {"$push": "$_id"}
        }},
        {"$match": {"count": {"$gt": 1}}}
    ]

    max_time_ms = 60000

    duplicates = list(collection.aggregate(pipeline))

It is never able to finish the execution, comes back with an error saying "Request timed out. Retries due to rate limiting: True., full error: {'ok': 0.0, 'errmsg': 'Request timed out. Retries due to rate limiting: True.', 'code': 50, 'codeName': 'ExceededTimeLimit'}". I have tried running the script with the below line as well but it came back with the same error:

duplicates = list(collection.aggregate(pipeline), maxTimeMS=max_time_ms, allowDiskUse=True))

Is there anything else I can do?

Upvotes: 0

Views: 199

Answers (1)

Balaji
Balaji

Reputation: 1768

'ok': 0.0, 'errmsg': 'Request timed out. Retries due to rate limiting: True.', 'code': 50, 'codeName': 'ExceededTimeLimit'

The above error causes, if a collection throughput limit RUs is exceeded. It may experience rate limiting and it leads to errors in mongo db. Make sure to enable Server Side Retry SSR to automate operation retries. It retries requests across all collections

Below is the example which successfully retrieves duplicate data from the dataset.

from pymongo import MongoClient

CONNECTION_STRING = "*****"

client = MongoClient(CONNECTION_STRING)

db = client['newDb']
collection = db['newColl']

print("Documents in the collection:")
for doc in collection.find():
    print(doc)

pipeline = [
    {"$group": {
        "_id": "$id",  
        "count": {"$sum": 1},  
        "ids": {"$push": "$_id"}  
    }},
    {"$match": {"count": {"$gt": 1}}}  
]

max_time_ms = 60000

duplicates = list(collection.aggregate(pipeline, maxTimeMS=max_time_ms))

print("\nDuplicate documents:")
for duplicate in duplicates:
    print(f"Duplicate ID: {duplicate['_id']}, Count: {duplicate['count']}, Document IDs: {duplicate['ids']}")

client.close()

Output: enter image description here

For more info on error, refer to this doc.

Upvotes: 0

Related Questions