Reputation: 261

How to get hadoop put to create directories if they don't exist

I have been using Cloudera's hadoop (0.20.2). With this version, if I put a file into the file system, but the directory structure did not exist, it automatically created the parent directories:

So for example, if I had no directories in hdfs and typed:

hadoop fs -put myfile.txt /some/non/existing/path/myfile.txt

It would create all of the directories: some, non, existing and path and put the file in there.

Now, with a newer offering of hadoop (2.2.0) this auto creation of directories is not happening. The same command above yields:

put: ` /some/non/existing/path/': No such file or directory

I have a workaround to just do hadoop fs -mkdir first, for every put, but this is not going to perform well.

Is this configurable? Any advice?

Upvotes: 26

Answers (5)

Salim muneer lala

Reputation: 99

The put operation requires the target directory to be present beforehand, as it does not automatically create it if it is missing. You must first create the directory before executing the put command.

To create nested directories in Hadoop, you can use the following command:

hadoop fs -mkdir -p <path/to/nested/directories>

hdfs dfs -mkdir -p <path/to/nested/directories>

If you need to delete nested directories in HDFS, you can use the recursive -r option with the rm command:

hadoop fs -rm -r <path/to/nested/directories>

hdfs dfs -rm -r <path/to/nested/directories>

These commands allow you to easily manage nested directories in HDFS!

Upvotes: 0

Vijayant

Reputation: 742

The put operation does not create the directory if it is not present. We need to create the directory before doing the put operation.

You can use following to create the directory.

hdfs dfs -mkdir -p <path>

-p

It will create parent directory first, if it doesn't exist. But if it already exists, then it will not print an error message and will move further to create sub-directories.

Upvotes: 4

aName

Reputation: 3043

EDITORIAL NOTE: WARNING THIS ANSWER IS INDICATED TO BE INCORRECT

hadoop fs ... is deprecated instead use : hdfs dfs -mkdir ....

Upvotes: 2

ChuckCottrill

Reputation: 4444

Placing a file into a non-extant directory in hdfs requires a two-step process. As @rt-vybor stated, use the '-p' option to mkdir to create multiple missing path elements. But since the OP asked how to place the file into hdfs, the following also performs the hdfs put, and note that you can also (optionally) check that the put succeeded, and conditionally remove the local copy.

First create the relevant directory path in hdfs, and then put the file into hdfs. You want to check that the file exists prior to placing into hdfs. And you may want to log/show that the file has been successfully placed into hdfs. The following combines all the steps.

fn=myfile.txt
if [ -f $fn ] ; then
  bfn=`basename $fn` #trim path from filename
  hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
  hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
  hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn
  success=$? #check whether file landed in hdfs
  if [ $success ] ; then
    echo "remove local copy of file $fn"
    #rm -f $fn #uncomment if you want to remove file
  fi
fi

And you can turn this into a shell script, taking a hadoop path, and a list of files (also only create path once),

#!/bin/bash
hdfsp=${1}
shift;
hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
for fn in $*; do
  if [ -f $fn ] ; then
    bfn=`basename $fn` #trim path from filename
    hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
    hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn >/dev/null
    success=$? #check whether file landed in hdfs
    if [ $success ] ; then
      echo "remove local copy of file $fn"
      #rm -f $fn #uncomment if you want to remove file
    fi
  fi
done

Upvotes: 1

art-vybor

Reputation: 419

Now you should use hadoop fs -mkdir -p <path>

Upvotes: 37

How to get hadoop put to create directories if they don&#39;t exist

Answers (5)

Related Questions

How to get hadoop put to create directories if they don't exist