Reputation: 3173
We have an OLAP table with 180 billion rows and 100+ columns, and the volume is close to 8TB in Hive. Most of the columns are dimensions and also we have few metrics columns also. We would like to build a real-time system support ad-hoc queries to run the dashboard applications, where the queries should be executed in sub 10 seconds latency.
We are now looking for the options to build such a real-time adhoc querying system, and we are checking about the possible options and actually struggling to choose a correct system. We are seeing about
Presto , can used to query hdfs directly, But we are not sure if it will support low latency queries over such huge volumes.
Cassandra, to build preaggregated views according to the queries.
Druid, to build preaggregated views and looks interesting, but seems doesn't have any enterprise support.
Here we are actually struggling to make a choice from these components and also we are not sure if we missed any other relevant tools that may suite for this requirement.
We are looking for the tool/database that can closely interact with HDFS, we can also consider any other tool if it read performance is good for large volumes.
I kindly request your help in guiding me about the component selection and also please advice me if I have to see about any other tools.
Upvotes: 1
Views: 525
Reputation: 359
Hi as you can see here https://cwiki.apache.org/confluence/display/Hive/Druid+Integration druid is getting integrated closely with Hive and that will enable to fully support your use cases, where some of the data can be queried from a fast data store like druid and heavy weight queries with complex join can go to Hive. Also note that from the listed solutions above, only druid has a robust (sub second latency) realtime ingestion firehose integrating kafka, storm, flink rabitMQ and the list goes on and on... From the support point of view note that druid has a very vibrant open source community plus it is used by hundred of companies including big ones like Yahoo NetFlix .... in addition there is at least 2 companies that will be providing enterprise supports, namely Hortonworks and Imply.
Upvotes: 1