Reputation: 405
I'm using mongodb today and i'm really happy with it. I need to find a solution for an event logging solution. The log includes loggins of content imprissions and clicks (like ads system). It's many writes and little reads (mainly for daily reporting). It seems like something like Casandra is better solution then Mongodb which seems better for document oriented data structure. Any thoughts ?
Upvotes: 8
Views: 4598
Reputation: 633
Actually, none of those databases are used for analysis by itself. Every Time when you chose NoSql solution for you solution, you have to consider how data would be manipulated.
Cassandra is perfect for writing huge amount of data with predictable performance, it is easy to scale on multi datacenter environments. On the other hand, reading performance depends on consistency factor.
MongoDB is perfect for structured data , which in your case is not advantage. MongoDB ensures that their data are consistent, but this fact could be a cause of performance degradation. More over MongoDB is not good for multi datacenter environments.
Regarding accessing data, they are also totally different. Cassandra provides with CQL (akka SQL) which doesn't support Join , group and etc.. In contrast to Cassandra CQL , MongoDB uses JavaScript,Json which uses own implementation of map/reduce for join operations.
To sum up, I think you should considers all those facts when you choose one of those database. From my point of you Cassandra fits well on your task but you should think well about model and what kind of queries are going to be used before starting working with Cassandra
P.S. I advise to consider SQL Engines as Apache Drill for MongoDb and PrestoDB for Cassandra for analysis purposes
Upvotes: 1
Reputation: 42596
Cassandra is optimised for high write-throughput (many thousands of writes per second), so seems suitable on that criterion, at least. However, if MongoDB performance is good enough for your app, and you are familiar with it, there may not be much advantage to Cassandra.
Upvotes: 1
Reputation: 19377
One of the nice things about Cassandra is its support for Hadoop map/reduce, which gives it access to a very robust ecosystem (e.g., Pig) of tools, examples, and so forth.
Depending on data volume and use case, you may also want to take advantage of its expiring columns feature (http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns).
Gemini also recently open-sourced its Cassandra real-time log processing tool, which may be similar to what you want (http://www.thestreet.com/story/11030367/1/gemini-releases-real-time-log-processing-based-on-flume-and-cassandra.html, https://github.com/geminitech/logprocessing).
Upvotes: 6
Reputation: 1390
We have used mongodb in the one of the projects to capture event logging for a distributed app. It works really well and it makes sense to do some calculations beforehand about the amount of storage, sharding and other factors.
As a suggestion, go with capped collection and have a mapreduce operation run every 24 hours or so to reduce the logs to an aggregate table of wanted value. I have noticed, that due to being "schema-less" the documents in mongodb can cause the db file size to grow really fast.
Upvotes: 4