Puneet Babbar
Puneet Babbar

Reputation: 105

Spark Cluster mode issue to read Hive-Hbase table on Kerberized Environment

Error description

We are not able execute our Spark job in yarn-cluster or yarn-client mode, though it is working fine in the local mode.

This issue occurs when we try to read the Hive-HBase tables in a Kerberized cluster.

What we have tried so far

  1. Passing all the HBASE jar in the –jar parameter in spark submi

--jars /usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.5.3.16-1.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar

  1. Passing Hbase site and hive site in file parameter in Spark submit

--files /usr/hdp/2.5.3.16-1/hbase/conf/hbase-site.xml,/usr/hdp/current/spark-client/conf/hive-site.xml,/home/pasusr/pasusr.keytab

  1. Doing Kerberos authentication inside the application. In the code we are explicitly passing the key tab

    UserGroupInformation.setConfiguration(configuration) val ugi: UserGroupInformation = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principle, keyTab) UserGroupInformation.setLoginUser(ugi) ConnectionFactory.createConnection(configuration) return ugi.doAs(new PrivilegedExceptionActionConnection { @throws[IOException] def run: Connection = { ConnectionFactory.createConnection(configuration) } })

  2. Passing key tab information in the Spark submit

  3. Passing the HBASE jar in the spark.driver.extraClassPath and spark.executor.extraClassPath

Error Log

18/03/20 15:33:24 WARN TableInputFormatBase: You are using an HTable instance that relies on an HBase-managed Connection. This is usually due to directly creating an HTable, which is deprecated. Instead, you should create a Connection object and then request a Table instance from it. If you don't need the Table instance for your own use, you should instead use the TableInputFormatBase.initalizeTable method directly.
18/03/20 15:47:38 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 406, hadoopnode.server.name): java.lang.IllegalStateException: Error while configuring input job properties
    at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:444)
    at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:342)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=50, exceptions:
Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:679)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

Upvotes: 0

Views: 849

Answers (1)

Puneet Babbar
Puneet Babbar

Reputation: 105

I was able to resolve this by adding following configuration in the spark-env.sh

export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar

And removing the spark.driver.extraClassPath and spark.executor.extraClassPath in which I was passing the above Jar from the Spark submit command.

Upvotes: 0

Related Questions