giliev
giliev

Reputation: 3058

Efficiently retrieve data (all in one batch ideally) with mongengine in Python 3

Let's say I have class User which inherits from the Document class (I am using Mongoengine). Now, I want to retrieve all users signed up after some timestamp. Here is the method I am using:

def get_users(cls, start_timestamp):
    return cls.objects(ts__gte=start_timestamp)

1000 documents are returned in 3 seconds. This is extremely slow. I have done similar queries in SQL in a couple of miliseconds. I am new to MongoDB and No-SQL in general, so I guess I am doing something terribly wrong.

I suspect the retrieval is slow because it is done in several batches. I read somewhere that for PyMongo the batch size is 101, but I do not know if that is same for Mongoengine.

Can I change the batch size, so I could get all documents at once. I will know approximately how much data will be retrieved in total.

Any other suggestions are very welcome.

Thank you!

Upvotes: 0

Views: 1046

Answers (1)

Steve Rossiter
Steve Rossiter

Reputation: 2925

As you suggest there is no way that it should take 3 seconds to run this query. However, the issue is not going to be the performance of the pymongo driver, some things to consider:

  • Make sure that the ts field is included in the indexes for the user collection
  • Mongoengine does some aggressive de-referencing so if the 1000 returned user documents have one or more ReferenceField then each of those results in additional queries. There are ways to avoid this.
  • Mongoengine provides a direct interface to the pymongo method for the mongodb aggregation framework this is by far the most efficient way to query mongodb
  • mongodb recently released an official python ODM pymodm in part to provide better default performance than mongoengine

Upvotes: 3

Related Questions