Reputation: 7784
I have a number of MongoDB collections which take a number of JSON documents from various streaming sources. In other-words there a a number of processes which are continually inserting data into a set of MongoDB collections.
I need a way to stream the data out of MongoDB into downstream applications. So I want an system that conceptually looks like this:
App Stream1 -->
App Stream2 --> MONGODB ---> Aggregated Stream
App Stream3 -->
OR this:
App Stream1 --> ---> MongoD Stream1
App Stream2 --> MONGODB ---> MongoD Stream2
App Stream3 --> ---> MongoD Stream3
The question is how do I stream data out of Mongo without having to continually poll/query the database?
The obvious question answer would be "why dont you change those app streaming processes to send messages to a Queue like Rabbit, Zero or ActiveMQ which then has them send to your Mongo Streaming processes and Mongo at once like this":
MONGODB
/|\
|
App Stream1 --> | ---> MongoD Stream1
App Stream2 --> SomeMQqueue ---> MongoD Stream2
App Stream3 --> ---> MongoD Stream3
In an ideal world yes that would be good, but we need Mongo to ensure that messages are saved first, to avoid duplicates and ensure that IDs are all generated etc. Mongo has to sit in the middle as the persistence layer.
So how do I stream messages out of a Mongo collection (not using GridFS etc) into these down stream apps. The basic school of thought has been to just poll for new documents and each document that is collected update it by adding another field to the JSON documents stored in the database, much like a process flag in a SQL table that stores a processed time stamp. I.e. every 1 second poll for documents where processed == null.... add processed = now().... update document.
Is there a neater/more computationally efficient method?
FYI - These are all Java processes.
Cheers!
Upvotes: 6
Views: 3558
Reputation: 7779
If you are writing to a capped collection (or collections), you can use a tailablecursor to push new data on the stream, or on a message queue from where it can be streamed out. However this will not work for a non-capped collection though.
Upvotes: 3