Hive alternative for big data query

Question

Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.

I'm not an expert about database architecture, and I would like to know if there is an alternative when the assumption above is not true, that is, when queries are made over a big data set.

Amar · Accepted Answer

There are a couple of alternatives to make the queries run significantly faster. I would't go into details of those but you can explore the following:

Cloudera Impala : Developed by cloudera http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html
Presto DB: Developed by Facebook http://prestodb.io/
Spark SQL : Build on top of Spark (https://spark.apache.org/sql/)

There are a lot of nice articles comparing Hive vs Impala vs Presto and comparing their performances. You can read about them and pick the one which best suits your use case. This is one link which compares their advantages and disadvantages: http://bigdatanerd.wordpress.com/2013/11/19/war-on-sql-over-hadoop/

Hive alternative for big data query

Answers (2)

Related Questions