Reputation: 57
I am attempting to run a spark job (using spark2-submit) from Oozie, so this job can be run on a schedule.
The job runs just fine when running we run the shell script from command-line under our service account (not Yarn). When we run it as a Oozie Workflow the following happens:
17/11/16 12:03:55 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied:
user=yarn, access=WRITE, inode="/user":hdfs:supergroup:drwxrwxr-x
Oozie is running the job as the user Yarn. IT has denied us any ability to change Yarn's permissions in HDFS, and there is not a single reference to the user
directory in the Spark script. We have attempted to ssh into the server - though this doesn't work - we have to ssh out of our worker nodes, onto the master.
The shell script:
spark2-submit --name "SparkRunner" --master yarn --deploy-mode client --class org.package-name.Runner hdfs://manager-node-hdfs/Analytics/Spark_jars/SparkRunner.jar
Any help would be appreciated.
Upvotes: 0
Views: 2951
Reputation: 9067
From Launching Spark (2.1) on YARN...
spark.yarn.stagingDir
Staging directory used while submitting applications
Default: current user's home directory in the filesystem
So, if you can create an HDFS directory somewhere, and grant yarn
the required privs -- i.e. rx
on all parent dirs and rwx
on the dir itself -- then request Spark to use that dir instead of /user/yarn
(which does not exist) then you should be fine.
Upvotes: 0
Reputation: 57
I was able to fix this by following https://stackoverflow.com/a/32834087/8099994
At the beginning of my shell script I now include the following line:
export HADOOP_USER_NAME=serviceAccount;
Upvotes: 0
Reputation: 1584
You need to add "<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
" into the shell action of your oozie workflow.xml. So that oozie uses the home directory of the user which has triggred the oozie worklfow rather than using the yarn home directory.
e.g
<action name='shellaction'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>sparksubmitShellScript.sh</exec>
<argument>${providearg}</argument>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${appPath}/sparksubmitShellScript.sh#sparksubmitShellScript.sh
</file>
</shell>
</action>
Modify as per your workflow if required you can directly mention the user name as well rather than using the user which triggered the workflow as below
<env-var>HADOOP_USER_NAME=${userName}</env-var>
specify userName=usernamevalue in your job.properties
Upvotes: 1