cdhit
cdhit

Reputation: 1454

Where does Hadoop get username and group mapping from for linux shell username and group mapping?

Currently I am working on a project to enhance the security for the Hadoop cluster. Eventually I will use Kerberos and Sentry for authentication and authorisation. And the username and group mapping will come from AD/LDAP (?), I think so.

But now I am just learning and trying. I have a question and I haven’t figure it out is

where the username/group mapping information come from?

As far as I know there is no username and group name for Hadoop and username and group name come from the client wherever from local client machine or Kerberos realm. But it is a little bit vague for me and can I get the implementation details here?

Is this information from the machine where HDFS client is located or from the linux shell username and group on name node? Or it depends on the context - even related to data node? What if the data nodes and name nodes have different users or user-group mapping in the local boxes.

Upvotes: 1

Views: 1838

Answers (2)

pifta
pifta

Reputation: 196

The client and the NameNode and all the Hadoop services that are checking the group membership and username on a Linux box uses the id command by default. (However I am not sure about the details on Windows clients, but recently it is done via JNI so there has to be a solution on that side as well.)

This means as well that the result will depend on the local box's user group mapping. If you are using Kerberos with whatever backend, or if you have a centralized backend for this via sssd or whatever else, then you can set up the box in its nsswitch.conf to use that behind the id command.

Side Note: There is a property called hadoop.security.group.mapping that defines the strategy to use to do the mapping. I do not recommend to use LDAPGroupMapping even if you have an LDAP backend, the JNIBasedGroupMappingWithFallback seems more reliable and works well.

Upvotes: 1

Abdennour TOUMI
Abdennour TOUMI

Reputation: 93333

Hadoop get that information from a global variable named HADOOP_USER_NAME.

If you want to pass another user_name , you can write like following:

HADOOP_USER_NAME=yourname hadoop dfs -put ...

So the command has to start with HADOOP_USER_NAME=VALUE ,

Upvotes: 2

Related Questions