Reputation: 1681
I'm using SageMaker notebooks together with a DocumentDB database. I'm running my notebook inside the VPC where the DocumentDB is, yet, retrieving data is taking a huge amount of time. I have roughly 60k documents in a collection. Consider the code below:
numbers = []
for x in collection.find({},{'number':1}):
numbers.append(x)
The field number
is just a string for identifying the document in another database. As I've said, this should return a list with roughly 60k strings. Nothing very large. Yet, it's taking a very long time to run (more than 10min). Is this normal in DocumentDB? What might be going on? When using a MongoDB locally in my machine, this used to be way faster.
Upvotes: 0
Views: 198
Reputation: 406
First of all, the query without a filter is a collection scan. Depending on the working set size (how large are those documents) and the instance size you're using, is possible that the results are not cached, hence slower than what you used on your local machine. Try forcing the use of the _id index, it may improve the query speed, like this:
for x in collection.find({'_id': {'$gt': 0}},{'number':1}):
Upvotes: 1