Reputation: 706
This is more a question about what type of NOSQL solution is more appropriate to tackle this problem.
The Problem
A java backend system produces "updates" for "parameters" with a frequency of about 1000/sec. A parameter is basically an entity with a value, a type, name, description, and a lot of other information attached to it regarding its definition, the validity, checks, update timestamps etc... The update is represented by a java pojo (~450 bytes total) and contains about 40 fields.
There is the need to save all those updates (1000/sec) for the next 10 years. As you can see you will end up having about 35 billion updates to store.
An important thing to know is that each update has only a small set of fields that change:
Storing all those updates in hbase as independent rows is not feasible because I will end up storing peta-bytes of data over the time and I cannot afford it. I also believe that it will not be possible to have responsive retrieval of this data.
Another important point is that I need to support very complicated retrieval queries, often with complex filters. Some example of those queries are reported as example below:
The Question
Is it more appropriate to use a Wide column solution like HBase or maybe it's better to go with document based solutions like MongoDB?
My priority is to keep the storage in the orders of Tera-bytes (let's say below 100-200 tera for the entire time) and having query responsiveness in the orders of few seconds (2-3 typically).
I know it's a very wide question but it would help me to see the point of view of someone for sure more expert than me!
Many thanks in advance
Upvotes: 0
Views: 69
Reputation: 985
HBase is well suited to key-value workloads with high volume random read and write access patterns, especially for for those organizations already heavily invested in HDFS as a common storage layer. The leading Hadoop distributor positioned HBase for “super-high-scale but rather simplistic use cases”.
Comparing to MongoDB, the positioning goes on to state the following: “HBase offers very fast random reads and random writes if you want to look up users on a particular key, but MongoDB provides a much richer model through which you could track user behavior all the way through an online application.”
MongoDB’s design philosophy blends key concepts from relational technologies with the benefits of emerging NoSQL databases. While HBase is highly scalable and performant for a subset of use cases, MongoDB can be used across a broader range of applications. The latter’s intuitive data model, multi-document ACID transactions, rich query framework, native drivers, and lower operational overhead will often enable users to ship new applications faster and more easily than with HBase.
Upvotes: 1