Reputation: 795
I am trying to list all the directory and files in the HDFS using JAVA.
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://ip address"), configuration);
FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://ip address/user/uname/"));
Path[] paths = FileUtil.stat2Paths(fileStatus);
for(FileStatus status : fileStatus){
System.out.println(status.getPath().toString());
}
My code able to generate fs object but got stuck on line number 3, here it try to read the folder and files of files. I am using AWS.
Please help me to resolve the issue.
Upvotes: 6
Views: 19627
Reputation: 201
Check the following method that get list of files using either recursive or non-recursive approach. For getting list of directories you can change the code in such a way that it will add directory paths to resulting list rather than files. Please check fs.isDirectory()
if-else clauses in the code for extracting paths of directories. FileStatus
class also has isDirectory(
) method to check whether the FileStatus
instance refers to a directory.
//helper method to get the list of files from the HDFS path
public static List<String>
listFilesFromHDFSPath(Configuration hadoopConfiguration,
String hdfsPath,
boolean recursive) throws IOException,
IllegalArgumentException
{
//resulting list of files
List<String> filePaths = new ArrayList<String>();
//get path from string and then the filesystem
Path path = new Path(hdfsPath); //throws IllegalArgumentException
FileSystem fs = path.getFileSystem(hadoopConfiguration);
//if recursive approach is requested
if(recursive)
{
//(heap issues with recursive approach) => using a queue
Queue<Path> fileQueue = new LinkedList<Path>();
//add the obtained path to the queue
fileQueue.add(path);
//while the fileQueue is not empty
while (!fileQueue.isEmpty())
{
//get the file path from queue
Path filePath = fileQueue.remove();
//filePath refers to a file
if (fs.isFile(filePath))
{
filePaths.add(filePath.toString());
}
else //else filePath refers to a directory
{
//list paths in the directory and add to the queue
FileStatus[] fileStatuses = fs.listStatus(filePath);
for (FileStatus fileStatus : fileStatuses)
{
fileQueue.add(fileStatus.getPath());
} // for
} // else
} // while
} // if
else //non-recursive approach => no heap overhead
{
//if the given hdfsPath is actually directory
if(fs.isDirectory(path))
{
FileStatus[] fileStatuses = fs.listStatus(path);
//loop all file statuses
for(FileStatus fileStatus : fileStatuses)
{
//if the given status is a file, then update the resulting list
if(fileStatus.isFile())
filePaths.add(fileStatus.getPath().toString());
} // for
} // if
else //it is a file then
{
//return the one and only file path to the resulting list
filePaths.add(path.toString());
} // else
} // else
//close filesystem; no more operations
fs.close();
//return the resulting list
return filePaths;
} // listFilesFromHDFSPath
Upvotes: 2
Reputation: 5891
this is working for me..
public static void main(String[] args) throws IOException, URISyntaxException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://localhost:9000/"), conf);
FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://localhost:9000/"));
for(FileStatus status : fileStatus){
System.out.println(status.getPath().toString());
}
}
output
hdfs://localhost:9000/All.txt
hdfs://localhost:9000/department.txt
hdfs://localhost:9000/emp.tsv
hdfs://localhost:9000/employee.txt
hdfs://localhost:9000/hbase
it think you are giving incorrect uri. try to do according the code.
if conf is not set then you have to add resource file
conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/hdfs-site.xml"));
Upvotes: 14