Kok-Lim Wong
Kok-Lim Wong

Reputation: 123

Unknown Host Error when running sql query over remote hive

I could not find anything on this after hours of Google search so I hope I can get some ideas to my problem here.

I am trying to get data from a remote hive cluster using spark2. I have followed:

  1. How to connect to a Hive metastore programmatically in SparkSQL?
  2. How to connect to remote hive server from spark

And I was able to connect to the remote hive metastore successfully.

However, my problem starts when I execute a query in the remote hive. e.g spark.sql("select count(*) from table"). I will get an "unknown host: ns-bigdata" error. Where ns-bigdata is the cluster name of the remote cluster.

What other things am I missing here? Need I specify where the hive.metastore.warehouse.dir should be as well? e.g. hdfs://local-cluster:8020/user/hive/warehouse

Thanks in advance.

Upvotes: 0

Views: 848

Answers (2)

Kok-Lim Wong
Kok-Lim Wong

Reputation: 123

The real reason was the customer did not set their kerberos cert in the hive thrift server for cross realm authentication. We ended up using jdbc impala.

Upvotes: 0

Yayati Sule
Yayati Sule

Reputation: 1631

The hive server URL is in the hive site. Can you try and use that?? Also do check if hive-site.xml is present in the conf/ directory of spark

Upvotes: 0

Related Questions