Cathal Coffey
Cathal Coffey

Reputation: 1127

Mongodb: keeping a frequently written collection in RAM

I am collecting data from a streaming API and I want to create a real-time analytics dashboard. This dashboard will display a simple timeseries plotting the number of documents per hour. I am wondering if my current approach is optimal.

In the following example, on_data is fired for each new document in the stream.

# Mongo collections.
records = db.records
stats = db.records.statistics

on_data(self, data):
    # Create a json document from data.
    document = simplejson.loads(data)

    # Insert the new document into records.
    records.insert(document)

    # Update a counter in records.statistics for the hour this document belongs to. 
    stats.update({ 'hour': document['hour'] }, { '$inc': { document['hour']: 1 } }, upsert=True)

The above works. I get a beautiful graph which plots the number of documents per hour. My question is about whether this approach is optimal or not. I am making two Mongo requests per document. The first inserts the document, the second updates a counter. The stream sends approximately 10 new documents a second.

Is there for example anyway to tell Mongo to keep the db.records.statistics in RAM? I imagine this would greatly reduce disk access on my server.

Upvotes: 2

Views: 1334

Answers (1)

3rf
3rf

Reputation: 1164

MongoDB uses memory map to handle file I/O, so it essentially treats all data as if it is already in RAM and lets the OS figure out the details. In short, you cannot force your collection to be in memory, but if the operating system handles things well, the stuff that matters will be. Check out this link to the docs for more info on mongo's memory model and how to optimize your OS configuration to best fit your use case: http://docs.mongodb.org/manual/faq/storage/

But to answer your issue specifically: you should be fine. Your 10 or 20 writes per second should not be a disk bottleneck in any case (assuming you are running on not-ancient hardware). The one thing I would suggest is to build an index over "hour" in stats, if you are not already doing that, to make your updates find documents much faster.

Upvotes: 3

Related Questions