Reputation: 8614

Polling MongoDb on Indexed Field vs Tailable Cursor

In MongoDb's documentation about tailable cursors it states the following:

If your query is on an indexed field, do not use tailable cursors, but instead, use a regular cursor. Keep track of the last value of the indexed field returned by the query. To retrieve the newly added documents, query the collection again using the last value of the indexed field in the query criteria

I'm setting up a query to find all documents after a specific point in time, and then to keep returning documents as they are inserted. I imagine the easiest way of doing this is to query on the _id (provided we're using ObjectIds, which we are) for anything $gt the time I want.

Since _id is indexed by default, how bad is it to continually poll MongoDb with the last _id I got and keep asking for things $gt it? I realize that this would only be within 1 second precision or so, since ObjectIds only store seconds since epoch, but I can live with that, so I assume I'd be querying at least once per second.

I guess I'm just surprised that the documentation recommends the approach of querying (presumably, continually in my case) versus keeping a tailable cursor open: I would have thought that push would be cheaper than pull?

Upvotes: 7

Answers (4)

twg

Reputation: 1105

The answers offered already are great and to the point. However, when I first read your question and problem, and perhaps I do not understand fully what exactly you are trying to do, it sounds to me like this problem/solution was built for Redis. It would be rather a simple matter of setting the cache to get/receive the information, you could access it, and remove the info when needed from the cache.

Also the amount of read/writes and certainly other operations on the DB would remain sane, as you would be polling the cache.

Again, maybe I did not understand the problem correctly, but setting up Redis correctly and using it seems the way to go in such a situation. Sounds like it was made for a cache answer.

Upvotes: 0

Ryan Wheale

Reputation: 28380

What it sounds like you are wanting is to be notified of new/updated/deleted objects in the DB. This is not possible with mongodb without a little trickery. I'm guessing you've read about reading oplogs using tailable cursors, and polling is always an absolute last resort. I've never tried those as they seem a little limiting (can't use them on shared db environments) and unreliable - not to mention difficult to set up (requires replica sets) and prone to change any time in the future without warning. For example, a once popular mongo-watch library is no longer maintained in leu of better alternatives).

DB "mutation events" are implemented some DB's: Postgres implements triggers and RethinkDB actually pushes changes out to you. If you can switch to something like RethinkDB - that would be ideal.

If not, my best advice to you is to put a service layer in front of your db through which all traffic must pass. Client applications can connect to these services via sockets (which is trivial using socket.io - implemented in nearly every language). Any time your service layer processes an update, insert, or delete, you can emit those events to anybody currently connected.

Constraints with this approach

All db communication should go through the service layer.

Caveats with this approach

If something updates the database directly, you wouldn't see those changes immediately. You'd have to query the db again. Not the end of the world.

Pros for using this approach

It's way better, more performant, and more real-time than polling.
You have a service tier which can do a lot more business stuff with your data, such as emitting events whenever data changes, validating data, sending emails, logging, updating other data sources, etc. ;)
It's a paradigm which works with any language, any db.
There are lightweight frameworks out there that implement this architecture already. FeathersJS is my favorite. You should really check it out. If you cannat use NodeJS, you should at least take a hint from how feathers services works.

Upvotes: 2

Rahul

Reputation: 16335

If you go with tailable cursors, there are a few issues which I can think of :

You have to receive every message in the collection before we get to the 'end'
You have to go back to the beginning if you ever exhaust the cursor (and its await_data delay). So, in case of application restart, db restart etc you don't have any option but to iterate from the beginning.

In addition to above, there are a few more caveats with using a tailable cursor given the fact they work only for capped collections.

Scalability limitation with number of connections. Each client connection will add a connection thread in the mongod servers (or mongos).
Capped collections have a fixed maximum size. The documents cannot grow beyond that size.
You cannot shard a capped collection
Any updates to documents in a capped collection must not cause a document to grow. (i.e. not all $set operations will work, and no $push or $pushAll will)
You may not explicitly .remove() documents from a capped collection
You don't have any control on which document gets removed from the collection. It will act like a circular queue.

how bad is it to continually poll MongoDb with the last _id I got and keep asking for things $gt it?

IMO, polling does introduce a latency and unnecessary busy-wait even when there are no updates but you have a lot under your control.

Performance wise, there shouldn't be an issue as long as you are using an indexed field for querying.

Upvotes: 2

helmy

Reputation: 9497

There's a big caveat here that I think that you might have overlooked. Tailable cursors only work for capped collections. Using a capped collection is probably not a general purpose solution, it's going to require careful planning to ensure that you size the capped collection appropriately to account for your data size and growth.

Upvotes: 3

Polling MongoDb on Indexed Field vs Tailable Cursor

Answers (4)

Related Questions