Reputation: 177
I appreciate your help in advance.
I have setup Hadoop in Pseudo Distributed mode using the root user credentials. I want to provide access to multiple users (let us say hadoop1, hadoop2, etc) to be able to submit and run MapReduce jobs on this cluster. How do we get this done?
What I have done so far?
> - Setup Hadoop to run in Pseudo-distributed mode
> - Used "root" user credentials to set this up.
> - Added users hadoop1 and hadoop2 to a group called "hadoop".
> - Added root also to be part of the group "hadoop".
> - Created a folder called hdfstmp and set this as the path for hadoop.tmp.dir.
> - Started the cluster using bin/start-all.sh
> - Ran MapReduce jobs using hadoop1 and hadoop2 users.
I got the error below:
Exception in thread "main" java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1006)
at java.io.File.createTempFile(File.java:1989)
at org.apache.hadoop.util.RunJar.main(RunJar.java:119)
However, if I do a stop-all.sh and then do a start-all.sh, the DataNode (and occassionally even NameNode) does not start up. When I check the logs, i see an error as below:
2013-09-21 16:43:54,518 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /data/hdfstmp/dfs/data, expected: rwxr-xr-x, while actual: rwxrwxr-x
Now, without change to the group ownership of the hdfstmp directory, my MR jobs submitted by different users do not run. But when the NameNode gets restarted, i get the issue as above.
How do i overcome this issue? What is the best practice for the same?
Also, is there a way to monitor the jobs that are being submitted by the different users? I am assuming the Web UI should allow me to do this. Please confirm.
I appreciate any assistance you can provide me on this issue. Thanks.
Regards
Upvotes: 1
Views: 1691
Reputation: 1729
Adding a dedicated Hadoop system user
We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc).
#addgroup hadoop
#adduser --ingroup hadoop hadoop1
#adduser --ingroup hadoop hadoop2
This will add the user hduser and the group hadoop to your local machine.
Change permission of your hadoop installed directory
chown -R hduser:hadoop hadoop
And lastly change hadoop temporary directoy permission
If your temp directory is /app/hadoop/tmp
#mkdir -p /app/hadoop/tmp
#chown hduser:hadoop /app/hadoop/tmp
and if you want to tighten up security, chmod from 755 to 750...
#chmod 750 /app/hadoop/tmp
Upvotes: 0