Reputation: 1399
I have ArrayList
. It contains about 20,000 file path elements.
private List<Path> listOfPaths = new ArrayList<>();
I want to read the contents of files on these paths in multithreaded mode.
The problem is that this code runs quite slowly. How can I select several threads so that each of them reads the file and writes it to the dto
? How to solve the problem of one thread starting to process a file so that the other thread does not do the same with the same file?
Upvotes: 0
Views: 3691
Reputation: 7938
I created ioPool to not to block common-pool(which is used by default on parallel stream operations) with io operations. Normally it is advised if you are doing io operations you can create core-count* 2
threads, but it is really io limited as others mentioned.
You can do this like below. This won't process your file list in order.
ForkJoinPool ioPool = new ForkJoinPool(8);
ForkJoinTask<?> tasks = ioPool.submit(
() -> pathList.parallelStream().forEach(//your code here);
tasks.get(); // this blocks until all threads finishes in the pool
Upvotes: 1
Reputation: 4000
You can likely split the work in smaller chunks, each thread processing a portion of all the files. Each thread would have his own sub list of data to processed and list of processed data to avoid any risk of trying to read/write the same data at the same time. When all the thread have finished, you would colect the results.
Actually you can let java 8 parallel stream do the hard work of splitting/mergin etc for you.
Using standard streams not using multiple threads:
List<ParamsDTO> paramsList = listOfPaths.stream().map(p -> readFile(p)).collect(Collectors.toList());
Using parallel streams for improved performance:
List<ParamsDTO> paramsList = listOfPaths.parallelStream().map(p -> readFile(p)).collect(Collectors.toList());
Where you have defined the function readFile as something like:
public ParamDTO readFile(Path p) {
ParamsDTO params = new ParamsDTO();
params.setParams(Files.readAllBytes(path));
return params;
}
You'll likely want to go beyond that in the long run, controlling the level of parallelism depending of the type of disk and to get more control, go with Java 5 executors for managing the thread pool characteristics and plain runable or futures for tasks to run.
Upvotes: 1