user3031097
user3031097

Reputation: 177

Hadoop: Pseudo Distributed mode for multiple users

I appreciate your help in advance.

I have setup Hadoop in Pseudo Distributed mode using the root user credentials. I want to provide access to multiple users (let us say hadoop1, hadoop2, etc) to be able to submit and run MapReduce jobs on this cluster. How do we get this done?

What I have done so far?

> - Setup Hadoop to run in Pseudo-distributed mode
> - Used "root" user credentials to set this up.
> - Added users hadoop1 and hadoop2 to a group called "hadoop".
> - Added root also to be part of the group "hadoop".
> - Created a folder called hdfstmp and set this as the path for hadoop.tmp.dir.
> - Started the cluster using bin/start-all.sh
> - Ran MapReduce jobs using hadoop1 and hadoop2 users.

I got the error below:

Exception in thread "main" java.io.IOException: Permission denied
        at java.io.UnixFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:1006)
        at java.io.File.createTempFile(File.java:1989)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:119)

However, if I do a stop-all.sh and then do a start-all.sh, the DataNode (and occassionally even NameNode) does not start up. When I check the logs, i see an error as below:

2013-09-21 16:43:54,518 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /data/hdfstmp/dfs/data, expected: rwxr-xr-x, while actual: rwxrwxr-x

Now, without change to the group ownership of the hdfstmp directory, my MR jobs submitted by different users do not run. But when the NameNode gets restarted, i get the issue as above.

How do i overcome this issue? What is the best practice for the same?

Also, is there a way to monitor the jobs that are being submitted by the different users? I am assuming the Web UI should allow me to do this. Please confirm.

I appreciate any assistance you can provide me on this issue. Thanks.

Regards

Upvotes: 1

Views: 1691

Answers (1)

user2486495
user2486495

Reputation: 1729

Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc).

#addgroup hadoop
#adduser --ingroup hadoop hadoop1
#adduser --ingroup hadoop hadoop2

This will add the user hduser and the group hadoop to your local machine.

Change permission of your hadoop installed directory

chown -R hduser:hadoop hadoop

And lastly change hadoop temporary directoy permission

If your temp directory is /app/hadoop/tmp

#mkdir -p /app/hadoop/tmp
#chown hduser:hadoop /app/hadoop/tmp

and if you want to tighten up security, chmod from 755 to 750...

#chmod 750 /app/hadoop/tmp

Upvotes: 0

Related Questions