Philipp Sumi
Philipp Sumi

Reputation: 987

MongoDB Atlas read performance

EDIT: That was a red herring - see answer below.

We have a set of databases on MongoDB Atlas (M10, M20 clusters), and I noticed that while planning and executing queries is super fast, actually returning bigger sets of documents takes ages.

As an example, the query below fetches 30K IDs from a collection (which contains just 100K documents). This takes 15 seconds to return that data from an M20 cluster (either to a co-located node app, or to my local machine):

db.mycollection.find({}, {_id: 1}).limit(30000)

In comparison: my toy PostgreSQL instance that costs a fraction of the M20 cluster (also located in the same AWS location) needs 0.5 seconds to return me 100K full rows from a table with 10M rows.

I understand that MongoDb has some overhead due to JSON, but that performance difference is so huge that I can't help but wonder whether that is really just the performance I can expect from a MongoDB, or whether something is severely off with those clusters?

Upvotes: 2

Views: 676

Answers (1)

Philipp Sumi
Philipp Sumi

Reputation: 987

Turns out this was a huge red herring :/

  • After running the query on our production application, some post-processing runs. It turns out this added a huge overhead in case of this big a result.
  • Every time I executed the naked query on my local DB client, it took pretty much the same time (minus a little bit of time I attributed to the missing post-processing). But what's probably happening is that this client applies very small batch sizes when retrieving the data, or even limits the results behind the scene, as it shows the data in pages. So completely different reason that lead to the same delay, which sent me on a wild goose chase :)

Silver lining: Not relying on default batch sizes made a huge difference here. Explicitly setting the batch size to 5K or 10K go me an easy 30% performance boost, also when triggering the queries through the Node client library.

Upvotes: 0

Related Questions