Reputation: 955
I am having following directory structure in HDFS,
/analysis/alertData/logs/YEAR/MONTH/DATE/HOURS
That is data is coming on houly basis and stored in format of year/month/day/hour.
I have written a shell script in which i am passing path till
"/analysis/alertData/logs" ( this will vary depending on what product of data i am handling)
then shell script go through the year/month/date/hour folders and return the most latest path.
For example:
Directories present in HDFS has following structure:
/analysis/alertData/logs/2014/10/22/01
/analysis/alertData/logs/2013/5/14/04
shell script is given path till : " /analysis/alertData/logs "
it outputs most recent directory : /analysis/alertData/logs/2014/10/22/01
My question is here is how can i validate whether HDFS directory path pass to shell script is valid or not. Lets say i pass a wrong path as input or path which does not exist so how to handle that in shell script.
Sample wrong path can be:
wrong path : /analysis/alertData ( correct path : /analysis/alertData/logs/ )
wrong path : /abc/xyz/ ( path does not exit in HDFS )
I tried using Hadoop dfs -test -z/-d/-e options did not worked for me. Any suggestion for this.
NOTE : Not posting my original code here, as solution to my problem does not depend on it.
Thanks in advance.
Upvotes: 17
Views: 67652
Reputation: 31
works for scala with spark.
import org.apache.hadoop.fs.{FileSystem, Path}
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val fileExists = fs.exists(new Path(<HDFSPath>)) //return boolean of true or false
Upvotes: 0
Reputation: 327
Since
hdfs dfs -test -d $yourdir
return 0 if exists, then
if [ $? == 0 ]; then
echo "exists"
else
echo "dir does not exists"
fi
Upvotes: 19
Reputation: 3661
Hadoop fs is deprecated Usage: hdfs dfs -test -[ezd] URI
Options: The -e option will check to see if the file exists, returning 0 if true. The -z option will check to see if the file is zero length, returning 0 if true. The -d option will check to see if the path is directory, returning 0 if true. Example: hdfs dfs -test -d $yourdir
Please check the following for more info: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html Regards
Upvotes: 7
Reputation: 1300
Try w/o test command []:
if $(hadoop fs -test -d $yourdir) ; then echo "ok";else echo "not ok"; fi
Upvotes: 33
Reputation: 1327
Hi I have used following script to test the HDFS directory exists or not. I have seen in your question that you tried this test command and not worked. Could you please provide any trace on why this not working..
hadoop fs -test -d $dirpath
if [ $? != 0 ]
then
hadoop fs -mkdir $dirpath
else
echo "Directory already present in HDFS"
fi
Upvotes: 5