Reputation: 1377
I'm trying to understand how Spliterator
works, and how spliterators are designed. I recognize that trySplit()
is likely one of the more important methods of Spliterator
, but when I see some third-party Spliterator
implementations, sometimes I see that their spliterators return null for trySplit()
unconditionally.
The questions:
Spliterator
that returns null unconditionally? It seems like such a spliterator defeats the point of, well, splitting.trySplit()
, but is there a legitimate use case of a spliterator that unconditionally returns null?Upvotes: 12
Views: 1463
Reputation: 298599
There are more advantages than just splitting support:
The iteration logic is contained in a single tryAdvance
method rather than being spread over two methods like hasNext
, next
. Splitting the logic over two methods complicates a lot of Iterator
implementations, as it often implies that the hasNext
method has to perform an actual query attempt that might yield a value which then has to be remembered for the follow-up next
call. And the fact that this query has been made must be remembered as well, either explicit or implicitly.
It would be easier if there was a guaranty that hasNext
/next
are always called in the typical alternating fashion, however, there is no such guaranty.
One example is BufferedReader.readLine()
which has a simple tryAdvance
logic. A wrapping Iterator
has to call that method within the hasNext
implementation and remember the line for the next
call. (Ironically, the current BufferedReader.stream()
implementation does implement such a complicated Iterator
that will be wrapped into a Spliterator
instead of implementing the much simpler Spliterator
directly. It seems that the “I’m not familiar with that” problem should not be underestimated)
estimateSize()
; a Spliterator
may return an estimate (or even an exact number) of the remaining items that can be used to pre-allocate resources. This can raise efficiency.
characteristics()
; Spliterator
s can provide additional information about their content or behavior. Besides telling whether the estimated size is an exact size, you can learn whether you may see null
values, whether there is a defined encounter order or all values are distinct. A particular algorithm may take advantage of this. Clearly, the Stream
API is a buildup of such algorithms that may take advantage so when planning to create (or support creation of) streams and have a choice, implementing a Spliterator
telling as much meta-information as possible is preferred to implementing an Iterator
that will be wrapped later.
Upvotes: 3
Reputation: 7380
While the main advantage of Spliterator over Iterator is, as you said, its trySplit() method which allows it to be parallelized, there are other significant advantages:
http://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html
The Spliterator API was designed to support efficient parallel traversal in addition to sequential traversal, by supporting decomposition as well as single-element iteration. In addition, the protocol for accessing elements via a Spliterator is designed to impose smaller per-element overhead than Iterator, and to avoid the inherent race involved in having separate methods for hasNext() and next().
Furthermore, Spliterators can be directly converted to Streams using StreamSupport.stream to make use of Java8's streams.
Upvotes: 6
Reputation: 132610
One of the purposes of a Spliterator
is to be able to split, but that's not the only purpose. The other main purpose is as a support class for creating your own Stream
source. One way to create a Stream source is to implement your own Spliterator and pass it to StreamSupport.stream
. The simplest thing to do is often to write a Spliterator that can't split. Doing so forces the stream to execute sequentially, but that might be acceptable for whatever you're trying to do.
There are other cases where writing a non-splittable Spliterator makes sense. For example, in OpenJDK, there are implementations such as EmptySpliterator
that contain no elements. Of course it can't be split. A similar case is a singleton spliterator that contains exactly one element. It can't be split either. Both implementations return null
unconditionally from trySplit
.
Another case is where writing a non-splittable Spliterator is easy and effective, and the amount of code necessary to implement a splittable one is prohibitive. (At least, not worth the effort of writing one into a Stack Overflow answer.) For example, see the example Spliterator from this answer. The case here is that the Spliterator implementation wants to wrap another Spliterator and do something special, in this case check to see if it's not empty. Otherwise it just delegates everything to the wrapped Spliterator. Doing this with a non-splittable Spliterator is pretty easy.
Notice that there's discussion in that answer, the comment on that answer, in my answer to the same question, and the comment thread on my answer, about how one would make a splittable (i.e., parallel-ready) Spliterator. But nobody actually wrote out the code to do the splitting. :-) Depending upon how much laziness you want to preserve from the original stream, and how much parallel efficiency you want, writing a splittable Spliterator can get pretty complicated.
In my estimation it's somewhat easier to do this sort of stuff by writing an Iterator instead of a Spliterator (as in my answer noted above). It turns out that Spliterators.spliteratorUnknownSize
can provide a limited amount of parallelism, even from an Iterator, which is apparently a purely sequential construct. It does so within IteratorSpliterator
, which pulls multiple elements from the Iterator and processes them in batches. Unfortunately the batch size is hardcoded, but at least this gives the opportunity for processing elements pulled from an Iterator in parallel in certain cases.
Upvotes: 4