Reputation: 11
I having 120 millions records in Hive Table. While executing select
statement with some where
condition or performing any avg
,max(mathematical operation)
query is executing taking 2 to 3 hours. Do I missing some configuration in Hive. Because its seems very unusual that query taking this much time. I am using Hortonworks 2.5 Sandbox on Microsoft Azure Service.
Please Give any solution. Thank you.
Upvotes: 1
Views: 906
Reputation: 336
There are many ways to optimize queries but this might be helpful to you, as per your data set records. As per my understanding and practice I have highlighted few points here :
1: Use Tez Hive can use the Apache Tez execution engine instead of the venerable Map-reduce engine. I want to make a simple recommendation: if it’s not turned on by default in your environment, use Tez by setting to ‘true’ the following in the beginning of your Hive query:
set hive.execution.engine=tez;
With the above setting, every HIVE query you execute will take advantage of Tez.
2: Use ORCFile
Hive supports ORCfile, a new table storage format that supports speed improvements through techniques like predicate push-down, compression and more.
3: Use Vectorization
Vectorized query execution improves performance of operations like scans, aggregations, filters and joins, by performing them in batches of 1024 rows at once instead of single row each time.
Introduced in Hive 0.13, this feature significantly improves query execution time, and is easily enabled with two parameters settings:
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
4: Cost based query optimization
A recent addition to Hive, Cost-based optimization, performs further optimizations based on query cost, resulting in potentially different decisions: how to order joins, which type of join to perform, degree of parallelism and others.
To use cost-based optimization (also known as CBO), set the following parameters at the beginning of your query:
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
5: Write good SQL
SQL is a powerful declarative language. Like other declarative languages, there is more than one way to write a SQL statement. Although each statement’s functionality is the same, it may have strikingly different performance characteristics.
Upvotes: 5