Reputation: 10131
From the official Hive documentation:
Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.
I'm not an expert about database architecture, and I would like to know if there is an alternative when the assumption above is not true, that is, when queries are made over a big data set.
Upvotes: 5
Views: 2163
Reputation: 2314
From your question i can make out that you want to decrease the latency in query ... but you are ok with hdfs as a datastore .... you have many alternatives like presto and spark sql ... both of them seemlessly integrate with hive but have considerable performance benefits ..... the other alternative can be to shift the datastore to a no sql database .... if you want to use HDFS as the datastore hbase can provide some performance benefit .... others can be mongo , cassandra etc
Upvotes: 0
Reputation: 3845
There are a couple of alternatives to make the queries run significantly faster. I would't go into details of those but you can explore the following:
Cloudera Impala : Developed by cloudera http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html
Presto DB: Developed by Facebook http://prestodb.io/
Spark SQL : Build on top of Spark (https://spark.apache.org/sql/)
There are a lot of nice articles comparing Hive vs Impala vs Presto and comparing their performances. You can read about them and pick the one which best suits your use case. This is one link which compares their advantages and disadvantages: http://bigdatanerd.wordpress.com/2013/11/19/war-on-sql-over-hadoop/
Upvotes: 5