spark-submit --proxy-user do not work in yarn cluster mode

Question

Currently I am using a cloudera hadoop single node cluster (kerberos enabled.)

In client mode I use following commands

kinit
spark-submit --master yarn-client --proxy-user cloudera examples/src/main/python/pi.py

This works fine. In cluster mode I use following command (no kinit done and no TGT is present in the cache)

spark-submit --principal  --keytab  --master yarn-cluster examples/src/main/python/pi.py

Also works fine. But when I use following command in cluster mode (no kinit done and no TGT is present in the cache)

   spark-submit --principal  --keytab  --master yarn-cluster --proxy-user  examples/src/main/python/pi.py

throws following error

   tries to renew a token with renewer

I guess in cluster mode the spark-submit do not look for TGT in the client machine... it transfers the "keytab" file to the cluster and then starts the spark job. So why does the specifying "--proxy-user" option looks for TGT while submitting in the "yarn-cluster" mode. Am I doing some thing wrong.

Selam Getachew · Accepted Answer

Spark doesn't allow to submit keytab and principal with proxy-user. The feature description in the official documentation for YARN mode (second paragraph) states specifically that you need keytab and principal when you are running long running jobs. This enables the application to continue working with any security issue.

Imagine if all application users logging into your applications can proxy to your keytab.

I have to do what Hive does to run "spark-submit". Basically kinit before submitting my application and then provide a proxy-user. So here is how I solved it.

kinit @ -k -t spark-submit with --proxy-user

is best implementation. So no your are not doing anything wrong.

spark-submit --proxy-user do not work in yarn cluster mode

Answers (1)

Related Questions