All_Safe
All_Safe

Reputation: 1399

Reading multiple files in multithreaded mode

I have ArrayList. It contains about 20,000 file path elements.

private List<Path> listOfPaths = new ArrayList<>();

I want to read the contents of files on these paths in multithreaded mode. The problem is that this code runs quite slowly. How can I select several threads so that each of them reads the file and writes it to the dto? How to solve the problem of one thread starting to process a file so that the other thread does not do the same with the same file?

Upvotes: 0

Views: 3691

Answers (2)

miskender
miskender

Reputation: 7938

I created ioPool to not to block common-pool(which is used by default on parallel stream operations) with io operations. Normally it is advised if you are doing io operations you can create core-count* 2 threads, but it is really io limited as others mentioned.

You can do this like below. This won't process your file list in order.

 ForkJoinPool ioPool = new ForkJoinPool(8);
 ForkJoinTask<?> tasks = ioPool.submit(
              () -> pathList.parallelStream().forEach(//your code here);
 tasks.get(); // this blocks until all threads finishes in the pool

Upvotes: 1

Nicolas Bousquet
Nicolas Bousquet

Reputation: 4000

You can likely split the work in smaller chunks, each thread processing a portion of all the files. Each thread would have his own sub list of data to processed and list of processed data to avoid any risk of trying to read/write the same data at the same time. When all the thread have finished, you would colect the results.

Actually you can let java 8 parallel stream do the hard work of splitting/mergin etc for you.

Using standard streams not using multiple threads:

List<ParamsDTO> paramsList = listOfPaths.stream().map(p -> readFile(p)).collect(Collectors.toList());

Using parallel streams for improved performance:

List<ParamsDTO> paramsList = listOfPaths.parallelStream().map(p -> readFile(p)).collect(Collectors.toList());

Where you have defined the function readFile as something like:

public ParamDTO readFile(Path p) {
    ParamsDTO params = new ParamsDTO();
    params.setParams(Files.readAllBytes(path));
    return params;
}

You'll likely want to go beyond that in the long run, controlling the level of parallelism depending of the type of disk and to get more control, go with Java 5 executors for managing the thread pool characteristics and plain runable or futures for tasks to run.

Upvotes: 1

Related Questions