Reputation: 486
I have a list of JSON Strings, which contain lists of movies. I need to collect those movies, process them and store them in the disk. I am thinking of using a parallel stream method to collect the movies and test its performance. My approach is this:
The following method produces a List of Movies.
protected abstract List<T> parseJsonString(JsonIterator iter);
This method contains a parallel stream that collects a List of all Lists ( List<List<Movies)
) produced in the stream:
public CompletableFuture<List<List<T>>> parseJsonPages(List<CompletableFuture<String>> jsonPageList)
{
return jsonPageList.parallelStream()
.map( jsonPageStr -> CompletableFuture.supplyAsync( () -> {
try {
return parseJsonString(JsonIterator.parse( jsonPageStr.get() ) );
}
catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
System.exit(-1);
}
return null;
} ) )
.collect( ParallelCollectors.toFuture( Collectors.toList() ) );
}
The problem with this approach is that the stream will produce the lists of movies and then append all lists inside a list. Do You think this is an effective way of collecting all those movies? Should I merge the movies from all lists in one list, instead of just appending the entire lists inside a list (even though this also costs some time). If so, How do I perform such task?
Upvotes: 1
Views: 936
Reputation: 340118
In the future, when Project Loom arrives with its virtual threads, it will be much simpler and likely much faster execution to simply assign each task to a virtual thread.
Preliminary builds of Project Loom are available now, built on early-access Java 16. Though subject to change, and not ready for production, if this is a non-mission-critical personal project, you might consider using it now.
By the way, your Movie
class might be suitable to define as a record, one of the features coming in Java 16.
List< String > inputListsOfMoviesAsJson = … ; // Input.
Set< Movie > movies = Set.of() ; // Output. Default to unmodifiable empty `Set`.
try
(
ExecutorService executorService = Executors.newVirtualThreadExecutor() ;
)
{
movies = Collections.synchronizedSet( new HashSet< Movie > ) ;
for( String inputJson : inputListsOfMoviesAsJson )
{
Runnable task = () -> movies.addAll( this.parseJsonIntoSetOfMovies( inputJson ) ) ;
executorService.submit( task ) ;
}
}
// At this point, flow-of-control blocks until all tasks are done.
// Then the executor service is automatically shutdown as part of being closed, as an `AutoCloseable` in a try-with-resources.
… use your `Set` of `Movie` objects.
If you want to track success/failure, then capture and collect the Future
object returned by each call to executorService.submit( task )
. The code above ignores that return value, for simplicity of the demo.
As to your Question about accumulating a list, of resulting Movie
objects versus merging later, I do not think collecting those objects will be a bottleneck. My guess is that processing JSON will be the bottleneck. Either way, using profiler tools to verify your actual bottlenecks will likely be easier with the simpler coding possible when using Project Loom.
In code above, I use a Set
made thread-safe by a call to Collections.synchronized…
. You could try various implementations of Set
or List
. A list might be faster, but a set has the benefit of eliminating duplicates if that is an issue in your data inputs.
This approach assumes you have plenty of memory to handle all the JSON work. With virtual threads, all of those inputs might be getting processed at nearly the same time.
So if memory is a constrained resource, you’ll need to take further measures to prevent too many virtual threads from starting the JSON processing.
Virtual threads (fibers) are appropriate for work that involves blocking code. For purely CPU-bound tasks such as video-encoding, conventional platform/kernel threads are best. If you are doing nothing but processing JSON text already loaded into memory, then virtual threads may not show a benefit if they turn out to be CPU-bound. But I’d give it a try, as a test run is so easy. If you are doing any I/O (logging, accessing files, hitting a database, making network calls) then you will definitely see dramatic performance improvements with virtual threads.
Be sure your JSON processing library is built to be thread-safe.
And be sure your parseJsonIntoSetOfMovies
method is thread-safe.
Read the book, Java Concurrency In Practice by Brian Goetz et al.
Upvotes: 1