Mad Scientist
Mad Scientist

Reputation: 917

Faster way to collect in arraylist in java

I have a directory with many files and want to filter the one with a certain name and save them in the fileList ArrayList and it works in this way but it takes much time. Is there a way to make this faster?

String processingDir = "C:/Users/Ferid/Desktop/20181024";
String CorrId = "00a3d321-171c-484a-ad7c-74e22ffa3625");
Path dirPath = Paths.get(processingDir);       

ArrayList<Path> fileList;

try (Stream<Path> paths = Files.walk(dirPath))
{           
    fileList = paths.filter(t -> (t.getFileName().toString().indexOf("EPX_" + 
    corrId + "_") >= 0)).collect(Collectors.toCollection(ArrayList::new));
}

The walking through the directory in the try condition is not taking much time but the collecting it in fileList is taking much time and I do not know which operation it is exactly which has this poor performance or which of them to improve. (This is not the complete code of course, just the relevant things)

Upvotes: 1

Views: 335

Answers (2)

Peter Lawrey
Peter Lawrey

Reputation: 533660

If scanning the files each time is too slow you can build an index of files, either on startup or persisted and maintained as files change.

You could use a Watch Service to be notified when files are added or removed while the program is running.

This would be much faster to query as it would be entirely in memory. It would take the same amount of time to load the first time but could be loading the background before you need it initially.

e.g.

static Map<String, List<Path>> pathMap;
public static void initPathMap(String processingDir) throws IOException {
    try (Stream<Path> paths = Files.walk(Paths.get(processingDir))) {
        pathMap = paths.collect(Collectors.groupingBy(
                p -> getCorrId(p.getFileName().toString())));
    }
    pathMap.remove(""); // remove entries without a corrId.
}


private static String getCorrId(String fileName) {
    int start = fileName.indexOf("EPX_");
    if (start < 0)
        return "";
    int end = fileName.indexOf("_", start + 4);
    if (end < 0)
        return "";
    return fileName.substring(start + 4, end);
}

// later 
    String corrId = "00a3d321-171c-484a-ad7c-74e22ffa3625";
    List<Path> pathList = pathMap.get(corrId); // very fast.

You can make this code cleaner by writing the following, however, I wouldn't expect it to be much faster.

List<Path> fileList;

try (Stream<Path> paths = Files.walk(dirPath)) {           
    String find = "EPX_" + corrId + "_"; // only calculate this once
    fileList = paths.filter(t -> t.getFileName().contains(find))
                    .collect(Collectors.toList());
}

The cost is in the time taken to scan the files of the directory. The cost of processing the file names is far, far less.

Using an SSD, or only scanning directories already cached in memory would speed this up dramatically.

One way to test this is to perform the operation more than once after a clean boot (so it's not cached). The amount the first run takes longer tells you how much time was spent loading the data from disk.

Upvotes: 2

user10527814
user10527814

Reputation:

From java.nio.file.Files.walk(Path) api:

Return a Stream that is lazily populated with Path by walking the file tree rooted at a given starting file.

That's why it gives you the impression that "walking through the directory in the try condition is not taking much time".

Actually, the real deal is mostly done on collect and it is not collect's mechanism fault for being slow.

Upvotes: 4

Related Questions