we are trying to use talend batch (spark) jobs to access hive in a Kerberos cluster but we are getting the below "Can't get Master Kerberos principal for use as renewer" error. By using the standard jobs(non spark) in talend we are able to access hive without any issue. Below are the observation: When we are running spark jobs talend could able to connect to hive metastore and validating the syntax. ex if I provide the wrong table name it does return "table not found". when we select count(*) from table where there is no data it returns "NULL" but if some data present in Hdfs(table) It failed with the error "Can't get Master Kerberos principal for use as renewer". I am not sure exactly what is the issue which is causing the token problem. could some one help us know the root cause. One more thing to add instead of hive if I read / write to hdfs using spark batch jobs it works , So only problem is with hive and Kerberos.

Reputation: 739

Can't get Master Kerberos principal for use as renewer for Talend Batch Jobs

we are trying to use talend batch (spark) jobs to access hive in a Kerberos cluster but we are getting the below "Can't get Master Kerberos principal for use as renewer" error.

By using the standard jobs(non spark) in talend we are able to access hive without any issue.

Below are the observation:

When we are running spark jobs talend could able to connect to hive metastore and validating the syntax. ex if I provide the wrong table name it does return "table not found".
when we select count(*) from table where there is no data it returns "NULL" but if some data present in Hdfs(table) It failed with the error "Can't get Master Kerberos principal for use as renewer".

I am not sure exactly what is the issue which is causing the token problem. could some one help us know the root cause.

One more thing to add instead of hive if I read / write to hdfs using spark batch jobs it works , So only problem is with hive and Kerberos.

Upvotes: 3

Answers (3)

dhanooj kolathuparambil

Reputation: 1

If its a new cluster, and its the yarn job, please make sure that you have mr-framework uploaded to /user/yarn/mapreduce/mr-framewor in hdfs. All yarn job will fail with below error if that path missing: Error: java.io.IOException: Can't get Master Kerberos principal for use as renewer

$ hdfs dfs -put 3.1.1.7.1.7.2032-1-mr-framework.tar.gz /user/yarn/mapreduce/mr-framework

Upvotes: 0

geosmart

Reputation: 666

the same problem when I start spark on k8s,

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.             
: java.io.IOException: Can't get Master Kerberos principal for use as renewer                                                                               
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:133)                                                                                                                                                                                                         
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)                                                                                                                                                                                                         
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)                           
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:243)                             
        at org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:52)                                              
        at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(WholeTextFileRDD.scala:54)                                                                   
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)                                                    
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)                                                                              
        at scala.Option.getOrElse(Option.scala:121)                                                                                                         
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)

and I just add yarn-site.xml to the HADOOP_CONFIG_DIR.

the yarn-site.xml only contains yarn.resourcemanager.principal

<?xml version="1.0" encoding="UTF-8"?>

<configuration>
 <property>
    <name>yarn.resourcemanager.principal</name>
    <value>yarn/[email protected]</value>
  </property>
</configuration>

this working for me.

Upvotes: 2

sgalinma

Reputation: 200

You should include the hadoop config in the classpath (:/path/hadoop-configuration). You should include all configuration files in that hadoop configuration directory, not only the core-site.xml and hdfs-site.xml files. It happened to me and that solved the problem.

Upvotes: 1

Can&#39;t get Master Kerberos principal for use as renewer for Talend Batch Jobs

Answers (3)

Related Questions

Can't get Master Kerberos principal for use as renewer for Talend Batch Jobs