Davi Barreira
Davi Barreira

Reputation: 1681

DocumentDB with Pymongo very slow to query

I'm using SageMaker notebooks together with a DocumentDB database. I'm running my notebook inside the VPC where the DocumentDB is, yet, retrieving data is taking a huge amount of time. I have roughly 60k documents in a collection. Consider the code below:

numbers = []
for x in collection.find({},{'number':1}):
   numbers.append(x)

The field number is just a string for identifying the document in another database. As I've said, this should return a list with roughly 60k strings. Nothing very large. Yet, it's taking a very long time to run (more than 10min). Is this normal in DocumentDB? What might be going on? When using a MongoDB locally in my machine, this used to be way faster.

Upvotes: 0

Views: 198

Answers (1)

Mihai A
Mihai A

Reputation: 406

First of all, the query without a filter is a collection scan. Depending on the working set size (how large are those documents) and the instance size you're using, is possible that the results are not cached, hence slower than what you used on your local machine. Try forcing the use of the _id index, it may improve the query speed, like this:

for x in collection.find({'_id': {'$gt': 0}},{'number':1}):

Upvotes: 1

Related Questions