Queequeg
Queequeg

Reputation: 2864

Set root directory of HDFS in Configuration

I have a dir structure:

/  
   DIR files
   DIR usr

My HDFS is available at hdfs://db:123, so I create configuration:

configuration.set("fs.default.name", "hdfs://db:123");

Then all directories / paths are relative to the root (/). I created a directory files and that is where I want to keep all my files.

Do I have to manually append /files/ to the beginning of each path in my code or can I create configuration:

configuration.set("fs.default.name", "hdfs://db:123/files");

and no changes in the code will be necessary?

Upvotes: 1

Views: 3007

Answers (1)

Chris White
Chris White

Reputation: 30089

Usually the paths you pass are either relative to the users HDFS home directory if no leading slash is passed, or absolute if prefixed with a /.

If you look in the source for Path.makeQualified you should see a test for if the path is not absolute (this is from 1.0.3):

/** Returns a qualified path object. */
public Path makeQualified(FileSystem fs) {
  Path path = this;
  if (!isAbsolute()) {
    path = new Path(fs.getWorkingDirectory(), this);
  }

DistributedFileSystem.getWorkingDirectory() uses a instance variable called workingDir in response, which can be set using the setWorkingDirectory(path) method. If you don't set the working directory yourself, the default is the user's home directory (as can be seen in the DistributedFileSystem.initialize(..) method:

this.workingDir = getHomeDirectory();

And DistributedFileSystem.getHomeDirectory():

public Path getHomeDirectory() {
  return new Path("/user/" + dfs.ugi.getShortUserName()).makeQualified(this);
}

It doesn't appear you can configure the working directory via a configuration property so you're going to have to call the following before you submit your job (after which all relative paths will be relative to /files):

FileSystem.get(configuration).setWorkingDirectory("/files");

Upvotes: 2

Related Questions