Reputation: 21
I have a dataset sitting in Azure CosmosDB for MongoDB. It is about 3.5 million records. I have a Python script that looks for duplicates in the dataset. It is basically an aggregation pipeline, here is the snippet of the code:
pipeline = [
{"$group": {
"_id": f"${id}",
"count": {"$sum": 1},
"ids": {"$push": "$_id"}
}},
{"$match": {"count": {"$gt": 1}}}
]
max_time_ms = 60000
duplicates = list(collection.aggregate(pipeline))
It is never able to finish the execution, comes back with an error saying "Request timed out. Retries due to rate limiting: True., full error: {'ok': 0.0, 'errmsg': 'Request timed out. Retries due to rate limiting: True.', 'code': 50, 'codeName': 'ExceededTimeLimit'}". I have tried running the script with the below line as well but it came back with the same error:
duplicates = list(collection.aggregate(pipeline), maxTimeMS=max_time_ms, allowDiskUse=True))
Is there anything else I can do?
Upvotes: 0
Views: 199
Reputation: 1768
'ok': 0.0, 'errmsg': 'Request timed out. Retries due to rate limiting: True.', 'code': 50, 'codeName': 'ExceededTimeLimit'
The above error causes, if a collection throughput limit RUs is exceeded. It may experience rate limiting and it leads to errors in mongo db. Make sure to enable Server Side Retry SSR
to automate operation retries. It retries requests across all collections
Below is the example which successfully retrieves duplicate data from the dataset.
from pymongo import MongoClient
CONNECTION_STRING = "*****"
client = MongoClient(CONNECTION_STRING)
db = client['newDb']
collection = db['newColl']
print("Documents in the collection:")
for doc in collection.find():
print(doc)
pipeline = [
{"$group": {
"_id": "$id",
"count": {"$sum": 1},
"ids": {"$push": "$_id"}
}},
{"$match": {"count": {"$gt": 1}}}
]
max_time_ms = 60000
duplicates = list(collection.aggregate(pipeline, maxTimeMS=max_time_ms))
print("\nDuplicate documents:")
for duplicate in duplicates:
print(f"Duplicate ID: {duplicate['_id']}, Count: {duplicate['count']}, Document IDs: {duplicate['ids']}")
client.close()
Output:
For more info on error, refer to this doc.
Upvotes: 0