Reputation: 71
I enabled the permission management in my hadoop cluster, but I'm facing a problem sending jobs with pig. This is the scenario:
1 - I have hadoop/hadoop user
2 - I have myuserapp/myuserapp user that runs PIG script.
3 - We setup the path /myapp to be owned by myuserapp
4 - We set pig.temp.dir to /myapp/pig/tmp
But when we pig try to run the jobs we got the following error:
job_201303221059_0009 all_actions,filtered,raw_data DISTINCT Message: Job failed! Error - Job initialization failed: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=realtime, access=EXECUTE, inode="system":hadoop:supergroup:rwx------
Hadoop jobtracker requires this permission to statup it's server.
My hadoop policy looks like:
<property>
<name>security.client.datanode.protocol.acl</name>
<value>hadoop,myuserapp supergroup,myuserapp</value>
</property>
<property>
<name>security.inter.tracker.protocol.acl</name>
<value>hadoop,myuserapp supergroup,myuserapp</value>
</property>
<property>
<name>security.job.submission.protocol.acl</name>
<value>hadoop,myuserapp supergroup,myuserapp</value>
<property>
My hdfs-site.xml:
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>755</value>
</property>
<property>
<name>dfs.web.ugi</name>
<value>hadoop,supergroup</value>
</property>
My core site:
...
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
...
And finally my mapred-site.xml
...
<property>
<name>mapred.local.dir</name>
<value>/tmp/mapred</value>
</property>
<property>
<name>mapreduce.jobtracker.jobhistory.location</name>
<value>/opt/logs/hadoop/history</value>
</property>
Is there a missing configuration? How can I deal with multiples users running jobs in a restrict HDFS cluster?
Upvotes: 1
Views: 1637
Reputation: 584
Your problem is probably the staging directory. Try adding this property to mapred-site.xml:
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/user</value>
</property>
Then make sure that the submitting user (eg. 'realtime') has a home directory (eg. '/user/realtime') and that they own it.
Upvotes: 1
Reputation: 71
The fair scheduler is designed to run map reduce jobs as the user and it creates separeted pools for users/groups but have shared resources. At first look, there are some issues with this scheduler related to permissions on certain directories not allowing other users to execute/write in places that are necessary for the job to run.
So, one solution is to use Capacity scheduler:
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
</property>
Capacity Scheduler, use a number of named queues, where each queue has a configurable number of map and reduce slots. And one good thing about capacity is the ability of placing a limit on percent of running tasks per user, so that users share a cluster with a quota.
Upvotes: 0