sr1987
sr1987

Reputation: 23

Not able to read jceks file in yarn cluster mode in python

I am using a jceks file to decrypt my password and unable to read the encrypted password in yarn cluster mode

I have tried different methods like included

spark-submit --deploy-mode cluster 
--file /localpath/credentials.jceks#credentials.jceks
--conf spark.hadoop.hadoop.security.credential.provider.path=jceks://file////localpath/credentials.jceks test.py
spark1 = SparkSession.builder.appName("xyz").master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
x = spark1.sparkContext._jsc.hadoopConfiguration()
x.set("hadoop.security.credential.provider.path", "jceks://file///credentials.jceks")
a = x.getPassword("<password alias>")
passw = ""
for i in range(a.__len__()):
   passw = passw + str(a.__getitem__(i))

I am getting the below error:

attributeError: 'NoneType' object has no attribute 'len'

and when I am printing a ,it has None

Upvotes: 1

Views: 1916

Answers (2)

Manoj .J.V
Manoj .J.V

Reputation: 1

You can refer to jceks like this

#Choosing jceks from spark staging directory

jceks_location=“jceks://“ + str(“/user/your_user_name/.sparkStaging/“ + str(spark.sparkContest.applicationId) + “/credentials.jceks“)

x.set("hadoop.security.credential.provider.path",jceks_location)

Upvotes: 0

alvinyyt
alvinyyt

Reputation: 19

FWIW, if you try putting your jceks file to hdfs, the yarn workers would be able to find it while running on cluster mode, at least it works for me. Hope it works for you.

hadoop fs -put ~/.jceks /user/<uid>/.jceks
spark1 = SparkSession.builder.appName("xyz").master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
x = spark1.sparkContext._jsc.hadoopConfiguration()
jceks_hdfs_path = "jceks://hdfs@<host>/user/<uid>/.jceks"
x.set("hadoop.security.credential.provider.path", jceks_hdfs_path)
a = x.getPassword("<password alias>")
passw = ""
for i in range(a.__len__()):
   passw = passw + str(a.__getitem__(i))

With that you won't need to specify --files and --conf in the arguments when you run spark-submit. Hope it helps.

Upvotes: 1

Related Questions