Nikhil Das Nomula
Nikhil Das Nomula

Reputation: 1953

Java - Get metadata of files in a directory with million files in it

I am writing a Java app to get the file metadata of files in a directory and exporting it to a csv file. The app works fine if the number of files is less. But if I feed in a path that has like a 320000 files in all of directories and sub-direcories it is taking forever. Is there a way I can speed up things here ?

    private void extractDetailsCSV(File libSourcePath, String extractFile) throws ScraperException {

    log.info("Inside extract details csv");

    try{
        FileMetadataUtil fileUtil = new FileMetadataUtil();

        File[] listOfFiles = libSourcePath.listFiles();

        for(int i = 0; i < listOfFiles.length; i++) {

            if(listOfFiles[i].isDirectory()) {
                extractDetailsCSV(listOfFiles[i],extractFile);
            }

            if(listOfFiles[i].isFile()){

                ScraperOutputVO so = new ScraperOutputVO();

                Path path = Paths.get(listOfFiles[i].getAbsolutePath());

                so.setFilePath(listOfFiles[i].getParent());
                so.setFileName(listOfFiles[i].getName());

                so.setFileType(getFileType(listOfFiles[i].getAbsolutePath()));

                BasicFileAttributes basicAttribs = fileUtil.getBasicFileAttributes(path);
                if(basicAttribs != null) {
                    so.setDateCreated(basicAttribs.creationTime().toString().substring(0, 10) + " " + basicAttribs.creationTime().toString().substring(11, 16));
                    so.setDateLastModified(basicAttribs.lastModifiedTime().toString().substring(0, 10) + " " + basicAttribs.lastModifiedTime().toString().substring(11, 16));
                    so.setDateLastAccessed(basicAttribs.lastAccessTime().toString().substring(0, 10) + " " + basicAttribs.lastAccessTime().toString().substring(11, 16));
                }

                so.setFileSize(String.valueOf(listOfFiles[i].length()));
                so.setAuthors(fileUtil.getOwner(path));

                so.setFolderLink(listOfFiles[i].getAbsolutePath());
                writeCsvFileDtl(extractFile, so);

                so.setFileName(listOfFiles[i].getName());
                noOfFiles ++;
            }
        }
    } catch (Exception e) {
        log.error("IOException while setting up columns" + e.fillInStackTrace());
        throw new ScraperException("IOException while setting up columns" , e.fillInStackTrace());
    }

    log.info("Done extracting details to csv file");
}

public void writeCsvFileDtl(String extractFile, ScraperOutputVO scraperOutputVO) throws ScraperException {
    try {
        FileWriter writer = new FileWriter(extractFile, true);
        writer.append(scraperOutputVO.getFilePath());
        writer.append(',');
        writer.append(scraperOutputVO.getFileName());
        writer.append(',');
        writer.append(scraperOutputVO.getFileType());
        writer.append(',');
        writer.append(scraperOutputVO.getDateCreated());
        writer.append(',');
        writer.append(scraperOutputVO.getDateLastModified());
        writer.append(',');
        writer.append(scraperOutputVO.getDateLastAccessed());
        writer.append(',');
        writer.append(scraperOutputVO.getFileSize());
        writer.append(',');
        writer.append(scraperOutputVO.getAuthors());
        writer.append(',');
        writer.append(scraperOutputVO.getFolderLink());
        writer.append('\n');
        writer.flush();
        writer.close();
    } catch (IOException e) {
        log.info("IOException while writing to csv file" + e.fillInStackTrace());
        throw new ScraperException("IOException while writing to csv file" , e.fillInStackTrace());

    }
}

}

Upvotes: 0

Views: 1158

Answers (2)

Luca Basso Ricci
Luca Basso Ricci

Reputation: 18403

If you are using Java 7 you can rewrite with Files walking tree intf to check if is a filesystem problem of the problem is your code (maybe you are using data structure with poor performance or you are running out of memory and program slow down during execution)

EDIT:
This line

File[] listOfFiles = libSourcePath.listFiles();

will create an array of 320k object in memory and is a good way for poor performance (or OutOfmemoryError)

and second problem:

FileWriter writer = new FileWriter(extractFile, true);

you are open/close teh CSV file every time you need to write a file metadata!

You have to works in a manner like :

  1. Opening the CSV fileWriter
  2. Using Files walking tree intf for Java7 or DirectoryWalker for previous version to inspect in recursive manner every dir
  3. For every file you encounter while recurse dirtree (prev. point) write file metadata to CSV (and flush CSV file if you want)
  4. Close CSV file

Upvotes: 0

C. K. Young
C. K. Young

Reputation: 223023

Many filesystems are not efficient at handling directories with that many entries in them. There's very little you can do, codewise, to fix that. You need to try to move those files into multiple directories, to get better speed.

Other possible reasons for slowness are that you're either using a data structure that takes O(n) for each entry (resulting in O(n²) total runtime), or you're running out of heap space (so that GC dominates runtime).

Upvotes: 1

Related Questions