Not able to read jceks file in yarn cluster mode in python

I am using a jceks file to decrypt my password and unable to read the encrypted password in yarn cluster mode

I have tried different methods like included

spark-submit --deploy-mode cluster 
--file /localpath/credentials.jceks#credentials.jceks
--conf spark.hadoop.hadoop.security.credential.provider.path=jceks://file////localpath/credentials.jceks test.py

spark1 = SparkSession.builder.appName("xyz").master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
x = spark1.sparkContext._jsc.hadoopConfiguration()
x.set("hadoop.security.credential.provider.path", "jceks://file///credentials.jceks")
a = x.getPassword("<password alias>")
passw = ""
for i in range(a.__len__()):
   passw = passw + str(a.__getitem__(i))

I am getting the below error:

attributeError: 'NoneType' object has no attribute 'len'

and when I am printing a ,it has None

Upvotes: 1

Answers (2)

Manoj .J.V

Reputation: 1

You can refer to jceks like this

#Choosing jceks from spark staging directory

jceks_location=“jceks://“ + str(“/user/your_user_name/.sparkStaging/“ + str(spark.sparkContest.applicationId) + “/credentials.jceks“)

x.set("hadoop.security.credential.provider.path",jceks_location)

Upvotes: 0

alvinyyt

Reputation: 19

FWIW, if you try putting your jceks file to hdfs, the yarn workers would be able to find it while running on cluster mode, at least it works for me. Hope it works for you.

hadoop fs -put ~/.jceks /user/<uid>/.jceks

spark1 = SparkSession.builder.appName("xyz").master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
x = spark1.sparkContext._jsc.hadoopConfiguration()
jceks_hdfs_path = "jceks://hdfs@<host>/user/<uid>/.jceks"
x.set("hadoop.security.credential.provider.path", jceks_hdfs_path)
a = x.getPassword("<password alias>")
passw = ""
for i in range(a.__len__()):
   passw = passw + str(a.__getitem__(i))

With that you won't need to specify --files and --conf in the arguments when you run spark-submit. Hope it helps.

Upvotes: 1

Not able to read jceks file in yarn cluster mode in python

Answers (2)

Related Questions