Jeffrey Bosboom
Jeffrey Bosboom

Reputation: 13663

Find the minimum element of a stream, but bail out early if it's <= N

I want to find the minimum element of a large (hundreds of millions of elements) IntStream, but I can only use the result if it is > N, so I want to bail out as soon as I find an element <= N. I expect the minimum to be <= N most of the time.

IntStream.min() doesn't short-circuit, so I'd be stuck processing all the elements. The general IntStream.reduce doesn't short-circuit either.

IntStream.noneMatch(x -> x <= N) will ensure that the minimum element is > N and does short-circuit if it isn't, but doesn't actually tell me that minimum. I'd have to maintain state in the predicate (and add synchronization or be limited to sequential streams) to remember the actual minimum. Alternately, I could increment N and try again, possibly doing some sort of binary search over a likely range of N, but that sounds both slow and complicated.

How can I find the minimum of an IntStream, short-circuiting as soon as it's known to be <= N?

Upvotes: 7

Views: 391

Answers (2)

Stuart Marks
Stuart Marks

Reputation: 132460

My first thought was to use findAny but then I reread the question. :-)

The difference, of course, is that findAny (or findFirst) short-circuits as soon as it finds a matching element. You want to short-circuit if a non-matching element is found, then reduce or accumulate the rest.

While accumulation is a form of mutation and is frowned upon by FP purists, it really comes in handy sometimes. Java 8 has some nice additions like LongAccumulator that do accumulation of primitive values in a low-contention way, making them suitable for parallel processing. You can put the accumulation step into the peek stream operation.

Unfortunately there is no "IntAccumulator" so you have to do processing using long values. You can either turn your source into a LongStream or map the int values into long values.

Once you've done that, it's probably most straightforward to handle short-circuiting using allMatch. Naturally, if allMatch returns false, short-circuiting has occurred and the accumulator doesn't actually have the minimum value. But it should have the minimum value examined so far, probably the one that triggered the short-circuiting.

Putting it together would look something like this:

IntStream istream = ... 
LongAccumulator acc = new LongAccumulator(Long::min, Long.MAX_VALUE);

if (istream.mapToLong(i -> i).peek(acc::accumulate).allMatch(i -> i > N)) {
    System.out.println("min was " + acc.get());
} else {
    System.out.println("a value was <= " + N);
}

Upvotes: 3

Others
Others

Reputation: 3023

You can cheat a little bit using anyMatch. Caution, this is untested and sorta sketchy. Note that, although it has a side effect, the inner lambda is clearly not stateful with regards to the stream.

final AtomicInteger min=new AtomicInteger(Integer.MAX_VALUE);
final int minLimit=?;
boolean isValid=!stream.anyMatch(i->{
    if(i<=minLimit){
        return true;
    }
    min.getAndAccumulate(i, Math::min);
    return false;
});

This works by exploiting anyMatch's short circuiting.

In actual fact, it may be impossible to do this without storing state outside the lambda. This is because in order to do this we must be both stateful, to store the minimum so far, short circuiting, and have user defined behavior. The docs state that there are no such operators on IntStream.(Only limit is both stateful and short-circuiting). This seems like an issue about blocking/parallelism.

--Update-- Found an official source for the comment about parallelism:

"Except for operations identified as explicitly nondeterministic, such as findAny(), whether a stream executes sequentially or in parallel should not change the result of the computation.

Most stream operations accept parameters that describe user-specified behavior, which are often lambda expressions. To preserve correct behavior, these behavioral parameters must be non-interfering, and in most cases must be stateless. Such parameters are always instances of a functional interface such as Function, and are often lambda expressions or method references."

Looks like the issue is the user supplied behavior, not the short circuiting.

Upvotes: 3

Related Questions