Reputation: 29972
While analyzing a colleague's Java 7 code from a few years ago, I found that he implemented a utility for traversing data, potentially in parallel. He called it Range
, and it extended the Iterator
interface. Some of its new methods were awkwardly familiar:
int size()
would give the exact size of the range; Range split()
would separate the range into 2 pieces, preferably, although not necessarily, similar in size (modifying the current range and creating a new one);Range[] split(int n)
would separate the range into N sub-ranges, potentially attempting to make them as even as possible.void remove()
was there from Iterator
, its subtypes were merely throwing UnsupportedOperationException
.What I said to myself was that this is definitely what we can now do with (subsized) spliterators, thus taking advantage of the stream API. However, the spliterator in Java 8 does not provide an easy approach to splitting into n
parts.
With a recursive function, we can split it up to n = 2^L
parts with L
recursion levels, but the approach is not as trivial when n
is not a power of two, not to mention that it feels unnatural to keep a utility function just for the effect.
One might also say to simply avoid messing around with spliterators and let a stream do the splitting induced by forks during the actual processing, but the ForkJoin strategy might not be very adequate for the task, and does not guarantee that it will use the number of threads that we specifically wish to dedicate to the job. There could be cases of heavy tasks being performed on a small number of elements, in fact.
The question sums up to the following: having a spliterator with at least the characteristics SIZED and SUBSIZED, how can I split it into an exact number of spliterators?
Upvotes: 3
Views: 1789
Reputation: 298143
“… but the ForkJoin strategy might not be very adequate for the task”
sounds like premature optimization to me. You want to implement complicated things manually because the existing strategy might be inadequate…
“… and does not guarantee that it will use the number of threads that we specifically wish to dedicate to the job.”
There is no guaranty indeed, but the current stream implementation uses the common Fork/Join pool which can be configured to a desired number of threads. Dedicating a specified number of threads to the task(s) is exactly the strategy you are asking for.
“There could be cases of heavy tasks being performed on a small number of elements, in fact.”
I don’t see any contradiction to how the F/J framework works. It will attempt to split until the desired parallelism has been reached. If that implies that each thread is working on a single item only, well, it will do.
At this point, it’s worth noting that the default parallelism, matching the number of cores, is sufficient for any computational task that does not involve blocking, regardless of how much time processing a single item takes. As long as every thread has its workload, there is no acceleration possible beyond the number of actual hardware execution units.
In other words, the F/J frameworks implements the strategy you want to implement yourself (or a strategy which is superior to what you would implement), which brings us back to the first point, premature optimization.
Upvotes: 2
Reputation: 200148
The way to do it, which is not even that hacky and interoperates with the existing F/J support framework, is coding a spliterator wrapper which uses its own splitting policy. What you lose is the ability to leverage a natively random-access structure; you can only split by iterating over the inner splitetator's data set.
You can refer to my earlier answer where I present the code which splits into batches of predetermined size; it is quite easy to adapt it to your case with just an appropriate constructor call.
Upvotes: 2