Reputation: 261
I have been using Cloudera's hadoop (0.20.2). With this version, if I put a file into the file system, but the directory structure did not exist, it automatically created the parent directories:
So for example, if I had no directories in hdfs and typed:
hadoop fs -put myfile.txt /some/non/existing/path/myfile.txt
It would create all of the directories: some, non, existing and path and put the file in there.
Now, with a newer offering of hadoop (2.2.0) this auto creation of directories is not happening. The same command above yields:
put: ` /some/non/existing/path/': No such file or directory
I have a workaround to just do hadoop fs -mkdir first, for every put, but this is not going to perform well.
Is this configurable? Any advice?
Upvotes: 26
Views: 66296
Reputation: 99
The put operation requires the target directory to be present beforehand, as it does not automatically create it if it is missing. You must first create the directory before executing the put command.
To create nested directories in Hadoop, you can use the following command:
hadoop fs -mkdir -p <path/to/nested/directories>
or
hdfs dfs -mkdir -p <path/to/nested/directories>
If you need to delete nested directories in HDFS, you can use the recursive -r option with the rm command:
hadoop fs -rm -r <path/to/nested/directories>
or
hdfs dfs -rm -r <path/to/nested/directories>
These commands allow you to easily manage nested directories in HDFS!
Upvotes: 0
Reputation: 742
The put operation does not create the directory if it is not present. We need to create the directory before doing the put operation.
You can use following to create the directory.
hdfs dfs -mkdir -p <path>
-p
It will create parent directory first, if it doesn't exist. But if it already exists, then it will not print an error message and will move further to create sub-directories.
Upvotes: 4
Reputation: 3043
EDITORIAL NOTE: WARNING THIS ANSWER IS INDICATED TO BE INCORRECT
hadoop fs ...
is deprecated instead use : hdfs dfs -mkdir ....
Upvotes: 2
Reputation: 4444
Placing a file into a non-extant directory in hdfs requires a two-step process. As @rt-vybor stated, use the '-p' option to mkdir to create multiple missing path elements. But since the OP asked how to place the file into hdfs, the following also performs the hdfs put, and note that you can also (optionally) check that the put succeeded, and conditionally remove the local copy.
First create the relevant directory path in hdfs, and then put the file into hdfs. You want to check that the file exists prior to placing into hdfs. And you may want to log/show that the file has been successfully placed into hdfs. The following combines all the steps.
fn=myfile.txt
if [ -f $fn ] ; then
bfn=`basename $fn` #trim path from filename
hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn
success=$? #check whether file landed in hdfs
if [ $success ] ; then
echo "remove local copy of file $fn"
#rm -f $fn #uncomment if you want to remove file
fi
fi
And you can turn this into a shell script, taking a hadoop path, and a list of files (also only create path once),
#!/bin/bash
hdfsp=${1}
shift;
hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
for fn in $*; do
if [ -f $fn ] ; then
bfn=`basename $fn` #trim path from filename
hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn >/dev/null
success=$? #check whether file landed in hdfs
if [ $success ] ; then
echo "remove local copy of file $fn"
#rm -f $fn #uncomment if you want to remove file
fi
fi
done
Upvotes: 1