Reputation: 1953
I am writing a Java app to get the file metadata of files in a directory and exporting it to a csv file. The app works fine if the number of files is less. But if I feed in a path that has like a 320000 files in all of directories and sub-direcories it is taking forever. Is there a way I can speed up things here ?
private void extractDetailsCSV(File libSourcePath, String extractFile) throws ScraperException {
log.info("Inside extract details csv");
try{
FileMetadataUtil fileUtil = new FileMetadataUtil();
File[] listOfFiles = libSourcePath.listFiles();
for(int i = 0; i < listOfFiles.length; i++) {
if(listOfFiles[i].isDirectory()) {
extractDetailsCSV(listOfFiles[i],extractFile);
}
if(listOfFiles[i].isFile()){
ScraperOutputVO so = new ScraperOutputVO();
Path path = Paths.get(listOfFiles[i].getAbsolutePath());
so.setFilePath(listOfFiles[i].getParent());
so.setFileName(listOfFiles[i].getName());
so.setFileType(getFileType(listOfFiles[i].getAbsolutePath()));
BasicFileAttributes basicAttribs = fileUtil.getBasicFileAttributes(path);
if(basicAttribs != null) {
so.setDateCreated(basicAttribs.creationTime().toString().substring(0, 10) + " " + basicAttribs.creationTime().toString().substring(11, 16));
so.setDateLastModified(basicAttribs.lastModifiedTime().toString().substring(0, 10) + " " + basicAttribs.lastModifiedTime().toString().substring(11, 16));
so.setDateLastAccessed(basicAttribs.lastAccessTime().toString().substring(0, 10) + " " + basicAttribs.lastAccessTime().toString().substring(11, 16));
}
so.setFileSize(String.valueOf(listOfFiles[i].length()));
so.setAuthors(fileUtil.getOwner(path));
so.setFolderLink(listOfFiles[i].getAbsolutePath());
writeCsvFileDtl(extractFile, so);
so.setFileName(listOfFiles[i].getName());
noOfFiles ++;
}
}
} catch (Exception e) {
log.error("IOException while setting up columns" + e.fillInStackTrace());
throw new ScraperException("IOException while setting up columns" , e.fillInStackTrace());
}
log.info("Done extracting details to csv file");
}
public void writeCsvFileDtl(String extractFile, ScraperOutputVO scraperOutputVO) throws ScraperException {
try {
FileWriter writer = new FileWriter(extractFile, true);
writer.append(scraperOutputVO.getFilePath());
writer.append(',');
writer.append(scraperOutputVO.getFileName());
writer.append(',');
writer.append(scraperOutputVO.getFileType());
writer.append(',');
writer.append(scraperOutputVO.getDateCreated());
writer.append(',');
writer.append(scraperOutputVO.getDateLastModified());
writer.append(',');
writer.append(scraperOutputVO.getDateLastAccessed());
writer.append(',');
writer.append(scraperOutputVO.getFileSize());
writer.append(',');
writer.append(scraperOutputVO.getAuthors());
writer.append(',');
writer.append(scraperOutputVO.getFolderLink());
writer.append('\n');
writer.flush();
writer.close();
} catch (IOException e) {
log.info("IOException while writing to csv file" + e.fillInStackTrace());
throw new ScraperException("IOException while writing to csv file" , e.fillInStackTrace());
}
}
}
Upvotes: 0
Views: 1158
Reputation: 18403
If you are using Java 7 you can rewrite with Files walking tree intf to check if is a filesystem problem of the problem is your code (maybe you are using data structure with poor performance or you are running out of memory and program slow down during execution)
EDIT:
This line
File[] listOfFiles = libSourcePath.listFiles();
will create an array of 320k object in memory and is a good way for poor performance (or OutOfmemoryError)
and second problem:
FileWriter writer = new FileWriter(extractFile, true);
you are open/close teh CSV file every time you need to write a file metadata!
You have to works in a manner like :
Upvotes: 0
Reputation: 223023
Many filesystems are not efficient at handling directories with that many entries in them. There's very little you can do, codewise, to fix that. You need to try to move those files into multiple directories, to get better speed.
Other possible reasons for slowness are that you're either using a data structure that takes O(n) for each entry (resulting in O(n²) total runtime), or you're running out of heap space (so that GC dominates runtime).
Upvotes: 1