chandugunturi
chandugunturi

Reputation: 91

Hive JDBC Vs CLI client

I need to access data using Hive programatically (data in the order of GBs per query). I was evaluating CLI driver Vs Hive JDBC driver.

When we use JDBC, there is an extra overhead of thrift server & I am trying to understand how heavy is that. Also can it be a single point bottleneck if multiple clients connect to single thrift server? Or is it a common practice that people configure multiple thrift servers on Hadoop and do some load balancing stuff?

I am looking for the better performance rather than faster prototyping. Thanks in advance.

Upvotes: 2

Views: 2042

Answers (3)

techprat
techprat

Reputation: 375

You can try using connection pooling. I had a similar issue while submitting hive query through JDBC was taking more time than hive cli.

Also in your connection string mention few parameters as below:

jdbc:hive2://servername:portno/;hive.execution.engine=tez;tez.queue.name=alt;hive.exec.parallel=true;hive.vectorized.execution.enabled=true;hive.vectorized.execution.reduce.enabled=true;

Upvotes: 0

Dan Richelson
Dan Richelson

Reputation: 11

Shengjie's link doesn't work- This might properly automagically linkify:

http://blog.milford.io/2011/07/productionizing-the-hive-thrift-server/

Upvotes: 1

Shengjie
Shengjie

Reputation: 12796

From performance point of view, yes, thrift server can potentially be the bottleneck and the SPF. I've seen people set up multiple thrift servers talking to mysql metastore. Take a look at this http://blog.milford.io/2011/07/productionizing-the-hive-thrift-server/.Hope it helps.

Upvotes: 0

Related Questions