Reputation: 65
Currently I am using a cloudera hadoop single node cluster (kerberos enabled.)
In client mode I use following commands
kinit
spark-submit --master yarn-client --proxy-user cloudera examples/src/main/python/pi.py
This works fine. In cluster mode I use following command (no kinit done and no TGT is present in the cache)
spark-submit --principal <myprinc> --keytab <KT location> --master yarn-cluster examples/src/main/python/pi.py
Also works fine. But when I use following command in cluster mode (no kinit done and no TGT is present in the cache)
spark-submit --principal <myprinc> --keytab <KT location> --master yarn-cluster --proxy-user <proxy-user> examples/src/main/python/pi.py
throws following error
<proxy-user> tries to renew a token with renewer <myprinc>
I guess in cluster mode the spark-submit do not look for TGT in the client machine... it transfers the "keytab" file to the cluster and then starts the spark job. So why does the specifying "--proxy-user" option looks for TGT while submitting in the "yarn-cluster" mode. Am I doing some thing wrong.
Upvotes: 3
Views: 7642
Reputation: 158
Spark doesn't allow to submit keytab and principal with proxy-user. The feature description in the official documentation for YARN mode (second paragraph) states specifically that you need keytab and principal when you are running long running jobs. This enables the application to continue working with any security issue.
Imagine if all application users logging into your applications can proxy to your keytab.
I have to do what Hive does to run "spark-submit". Basically kinit before submitting my application and then provide a proxy-user. So here is how I solved it.
kinit @ -k -t spark-submit with --proxy-user
is best implementation. So no your are not doing anything wrong.
Upvotes: 3