techprat
techprat

Reputation: 375

Improving performance of hive jdbc

Does aynyone know how to increase performance for HIVE JDBC connection.

Detailed problem:

When I query hive from Hive CLI, I get a response within 7 sec but from HIVE JDBC connection I get a response after 14 sec. I was wondering if there is any way (configuration changes) with which I can improve performance for query through JDBC connection.

Thanks in advance.

Upvotes: 4

Views: 3706

Answers (4)

Srini Sydney
Srini Sydney

Reputation: 580

To improve the performance of jdbc connection Use the standard jdbc performance improvement features -,connection pooling , prepared statement pooling (starting with jdbc 3.0) performance improvement of hive cli can be done by changing these configuration parameters

-- enable cost based optimizer
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;

--collects statistics
analyze table <TABLENAME> compute statistics for columns;

--enable vectorization of queries.
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;

Hope thos helps

Upvotes: 0

techprat
techprat

Reputation: 375

Using Connection pooling helped me increase hive JDBC performance. As in hive there are many transformations happening while we query so using existing connection objects from connection pool instead of opening a new connection and closing for each request was quite helpful.

Please let me know if any one else if facing same issue will post a detailed answer.

Upvotes: 1

Jean de Lavarene
Jean de Lavarene

Reputation: 3763

If your database is Oracle you can try the Oracle Table Access for Hadoop and Spark (OTA4H) which can also be used from Hive QL. OTA4H will optimize the JDBC queries to retrieve the data from Oracle using splitters in order to get the best performance. You can join Hive tables with external tables inside Oracle directly in your hive queries.

Upvotes: 0

Kumar
Kumar

Reputation: 928

Can you please try the below options.

  1. If your query has joins then try setting the hive.auto.convert.join to true.

  2. Try changing the configuration of Java Heap Size and Garbage Collection reference Link

  3. Change the execution engine to Tez using set hive.execution.engine=tez To check currently set engine use hive.execution.engine.

Other Hive performance configuration tips can be found in the Link

Please let me know the results.

Upvotes: 0

Related Questions