Reputation: 9451
I am using Spark with Kerberos authentication.
I can run my code using spark-shell
fine and I can also use spark-submit
in local mode (e.g. —master local[16]
). Both function as expected.
local mode -
spark-submit --class "graphx_sp" --master local[16] --driver-memory 20G target/scala-2.10/graphx_sp_2.10-1.0.jar
I am now progressing to run in cluster mode using YARN.
From here I can see that you need to specify the location of the keytab
and specify the principal
. Thus:
spark-submit --class "graphx_sp" --master yarn --keytab /path/to/keytab --principal login_node --deploy-mode cluster --executor-memory 13G --total-executor-cores 32 target/scala-2.10/graphx_sp_2.10-1.0.jar
However, this returns:
Exception in thread "main" java.io.IOException: Login failure for login_node from keytab /path/to/keytab: javax.security.auth.login.LoginException: Unable to obtain password from user
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:987)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:564)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:978)
... 4 more
Before I run using spark-shell or on local mode in spark-submit I do the following kerberos setup:
kinit -k -t ~/keytab -r 7d `whoami`
Clearly, this setup is not extending to the YARN setup. How do I fix the Kerberos issue with YARN in cluster mode? Is this something which must be in my /src/main/scala/graphx_sp.scala file?
By running kinit -V -k -t ~/keytab -r 7d
whoami in verbose mode I was able to see the prinicpal was in the form user@node
.
I updated this, checked the location of the keytab
and things passed through this checkpoint succesfully:
INFO security.UserGroupInformation: Login successful for user user@login_node using keytab file /path/to/keytab
However, it then fails post this with:
client token: N/A
diagnostics: User class threw exception: org.apache.hadoop.security.AccessControlException: Authentication required
I have checked the permissions on the keytab and the read permissions are correct. It has been suggested that the next possibility is a corrupt keytab
Upvotes: 3
Views: 8451
Reputation: 9451
We found out that the Authentication
required
error happens, when the application tries to read from HDFS.
Scala was doing lazy evaluation, so it didn't fail, until it started
processing the file. This read from HDFS line:
webhdfs://name:50070
.
Since, WEBHDFS defines a public HTTP REST API to permit access, I
thought it was using acls
, but enabling ui.view.acls
didn't fix the
issue. Adding --conf
spark.yarn.access.namenodes=webhdfs://name:50070
fixed the
problem. This provides comma-separated list of secure HDFS namenodes,
which the Spark application is going to access. Spark acquires the
security tokens for each of the namenodes so that the application can
access those remote HDFS clusters. This fixed the authentication
required error.
Alternatively, direct access to HDFS hdfs://file
works and authenticates using Kerberos, with principal and keytab being passed during spark-submit
.
Upvotes: 1