zzzbbx
zzzbbx

Reputation: 10131

Hive alternative for big data query

From the official Hive documentation:

Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.

I'm not an expert about database architecture, and I would like to know if there is an alternative when the assumption above is not true, that is, when queries are made over a big data set.

Upvotes: 5

Views: 2163

Answers (2)

kanishka vatsa
kanishka vatsa

Reputation: 2314

From your question i can make out that you want to decrease the latency in query ... but you are ok with hdfs as a datastore .... you have many alternatives like presto and spark sql ... both of them seemlessly integrate with hive but have considerable performance benefits ..... the other alternative can be to shift the datastore to a no sql database .... if you want to use HDFS as the datastore hbase can provide some performance benefit .... others can be mongo , cassandra etc

Upvotes: 0

Amar
Amar

Reputation: 3845

There are a couple of alternatives to make the queries run significantly faster. I would't go into details of those but you can explore the following:

  1. Cloudera Impala : Developed by cloudera http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html

  2. Presto DB: Developed by Facebook http://prestodb.io/

  3. Spark SQL : Build on top of Spark (https://spark.apache.org/sql/)

There are a lot of nice articles comparing Hive vs Impala vs Presto and comparing their performances. You can read about them and pick the one which best suits your use case. This is one link which compares their advantages and disadvantages: http://bigdatanerd.wordpress.com/2013/11/19/war-on-sql-over-hadoop/

Upvotes: 5

Related Questions