Augustin Popa
Augustin Popa

Reputation: 1543

Logging insert, update and delete operations in MongoDB

I am new to MongoDB. I am looking for a way to log all operations that insert, update, or delete documents for the purposes of maintaining a history of changes. So for instance, I would like to know when a specific field in a specific document was updated, what it was updated to, and what it was before. This is for an ASP.NET C# application using the C# MongoDB driver, so I don't mind whether the solution just involves Mongo itself or C# code.

Example:

ID  |Timestamp |Operation|ObjectID |         PrevValue           |          NewValue
-----------------------------------------------------------------------------------------------
2153|1411390359|    i    |245245...|            null             |{name: "John Smith", age: 35} 
2154|1411390471|    u    |245245...|         {age: 35}           |          {age: 36}
2155|1411390478|    d    |245245...|{name: "John Smith", age: 36}|            null

It doesn't have to follow this exact format, but it should make it easy to track history.

I took a look at the MongoDB oplog, but it doesn't seem well-suited for this kind of work. In order to identify changes to a document for example, it looks like I have to look up two entries (the most recent update entry for the document, and the previous document entry, which may be an update or an insert) then compare each field in the document to check which one(s) were changed and what their original values were. I would rather simply have one entry that contains all this information for the sake of efficiency, in case many queries are needed to examine changes to many documents. I want to make sure using the log is as painless as possible. It would also be ideal if the log was stored in a MongoDB collection with a document for each entry (like with oplog) to make it easy to query.

Is there a feature in MongoDB (other than oplog) that can accomplish something like this? If not, is there a good third party tool that can? Or do I have to implement it manually?

EDIT: A few more details:

1) Another problem with oplog is that is capped, so it deletes the oldest entries once it runs out of space. I would like to maintain a list of all history no matter how old it is.

2) The contents of the log will be displayed in some way to the user, who may request to see the history of changes. This is not just for internal records for DB admins, so it needs to be queryable on demand.

EDIT #2: I have been brainstorming some solutions to this problem, but all of them still have some flaws, so I would appreciate further input if someone has a better idea.

Possible Solution 1: Maintain a separate document for each version (with a version #) and a flag if it has been deleted.

Implementation: Add a "version" field to each document. Version 1 is the initial state (represents an insertion), and subsequent versions represent updates, which are placed in separate newly created documents, rather than updating old ones. Version -1 would mean the document is now considered "deleted". Obviously this doesn't involve an actual log file.

Problems: Requires a lot of space and costly to query if I need to load historical change data for many documents at once. Also difficult to determine which specific fields were changed when, unless I also include a "Changes" field from the previous version, which also adds more space. Can also be messy and costly to query through newest versions if older versions need to be searched too (no way to tell which version is the current one). Even with yet another flag saying that it's current, I would then need to make sure the flag gets updated all the time as needed.

Possible Solution 2: Maintain revision history for each document as a sub-document field

Implementation: Each document has a "RevisionHistory" field that contains each revision with a version number. Easier to look up historical data for individual documents

Problems: Harder to perform history lookup across multiple documents, and any API calls to perform updates are more complicated due to the new field. In particular, I will need to show all changes within the past X hours (X is supplied by user) as part of implementation, which would require scanning all documents if this approach is used.

Possible Solution 3: Add a manual Log() function call from C#

Implementation: Every time the MongoDB C# Driver API is called with a new operation, have the programmer also add a line calling a special function to handle logging to the log file.

Problems: Dependent on the programmer actually remembering to add this function call manually and to call it properly. One simple mistake causes the logging system to become unreliable.

Possible Solution 4: Create a wrapper function for the API operations from C#

Implementation: Programmers don't call the API directly, but call a function that handles the work for them. This requires a lot of work, handling every possible type of relevant API call at the wrapper function level, but is consistent and reliable. Good way to abstract away DB access so something doesn't get broken with the wrong low-level API call. Somebody not working on the DAL can just call the wrapper function and the function figures out the details, including logging.

Problems: The wrapper function is more challenging to develop as it needs to account for all possible API calls that modify a document.

Right now I'm leaning towards solution 4. But again, if there is an easier way to do it, I am interested in hearing it.

Upvotes: 1

Views: 4746

Answers (1)

wdberkeley
wdberkeley

Reputation: 11671

No. There's no feature like this in MongoDB as of 2.6. The standard options for tracking CRUD activity are the oplog and the query profile collection. Both of these are capped and do not suffice for your purposes. The oplog doesn't store the original state of the document, just instructions on how to set it it to the new state: something like $inc will be changed to a $set in the oplog entry. If a document isn't changed for longer than the window of the capped collection then its state before the next change will be lost. The profiling collection is meant for performance monitoring and doesn't store changes to documents, just the queries that were sent, so to know how a query affected the state of the database you also need to know a lot about the state of the database at the time the query was run.

I don't know of any third party tool to do this, so as far as I know you'll have to write it yourself. You will need to implement it as a layer in your application, and access to MongoDB outside the application won't be tracked. The burden on your application and its usage of MongoDB will be large. It may be unrealistic to create something like this . For example, suppose your application issues the update

> db.collection.update({ "t" : { "$gte" : 10 } }, { "$inc" : { "t.$" : -1 } })

This finds all documents where at least one element of the t array is bigger than 9 and then decrements the first element of the array bigger than 9. How are you going to track the changes for this update? You can't, unless you issue the corresponding find before the update, and then either figure out what the changes are yourself, or do the find after the update and cross-reference the results to figure out the changes. What if the application is multithreaded? This approach won't work unless you coordinate all the threads somehow.

It might be realistic to track changes using the client if you can keep your threads to one-at-a-time access to the database and your updates are very simple, like just hitting one document at a time.

Upvotes: 2

Related Questions