VB_
VB_

Reputation: 45712

Stop java stream computations based on previous computation results

How to break stream computation based on previous results? If it's obvious that stream.filter(...).count() would be less than some number - how to stop stream computation?

I have the following code which checks if some sampleData passes the predicate test:

// sampleData.size() may be greater than 10.000.000
Set<String> sampleData = downloadFromWeb(); 
return sampleData.stream().filter(predicate::test).count() > sampleData.size() * coefficient;

I could have thousands of sampleData. The problem is that this code is ineffective. For example, if coefficient equals 0.5, sampleData.size() = 10_000_000, and first 5_000_000 elements fails the predicate::test - there is no reason to validate last 5_000_000 elements (count() will never be greater than 5_000_000).

Upvotes: 8

Views: 230

Answers (3)

Holger
Holger

Reputation: 298459

ZhekaKozlov’s answer is heading into the right direction, but it lacks the negation. For the matches to be larger than a certain threshold, the number of non matching elements must be smaller than “size - threshold”. If we test for the nonmatching elements to be smaller, we can apply a limit to stop once they become larger:

Set<String> sampleData = downloadFromWeb();
final long threshold = sampleData.size()-(long)(sampleData.size() * coefficient);
return sampleData.stream()
                 .filter(predicate.negate()).limit(threshold+1).count() < threshold;

There is, by the way, no reason to create a method reference to the test method of an existing Predicate like with predicate::test. Just pass the Predicate to the filter method. The code above also uses predicate.negate() instead of predicate.negate()::test

Upvotes: 4

Eugene
Eugene

Reputation: 120978

To be honest I am not quite sure this would be correct, I hope someone will come along and review this, but here is my idea of using a custom spliterator:

 static class CustomSpl<T> extends AbstractSpliterator<T> {

    private Spliterator<T> source;

    private int howMany;

    private int coefficient;

    private Predicate<T> predicate;

    private T current;

    private long initialSize;

    private void setT(T t) {
        this.current = t;
    }

    public CustomSpl(Spliterator<T> source, int howMany, int coefficient, Predicate<T> predicate, long initialSize) {
        super(source.estimateSize(), source.characteristics());
        this.source = source;
        this.howMany = howMany;
        this.coefficient = coefficient;
        this.predicate = predicate;
        this.initialSize = initialSize;
    }

    @Override
    public boolean tryAdvance(Consumer<? super T> action) {
        boolean hasMore = source.tryAdvance(this::setT);

        System.out.println(current);

        if (!hasMore) {
            return false;
        }

        if (predicate.test(current)) {
            ++howMany;
        }

        if (initialSize - howMany <= coefficient) {
            return false;
        }

        action.accept(current);
        return true;
    }

}

And for example this will produce only 4 elements, since we said to only care having a coefficient 5:

Spliterator<Integer> sp = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).stream().spliterator();

long count = StreamSupport.stream(new CustomSpl<>(sp, 0, 5, x -> x > 3, sp.getExactSizeIfKnown()), false)
            .count();

Also this is possible for spliterators with known size only.

Upvotes: 4

ZhekaKozlov
ZhekaKozlov

Reputation: 39596

Set<String> sampleData = downloadFromWeb();  
int size = (int) (sampleData.size() * coefficient);
return sampleData.stream().filter(predicate::test).limit(size + 1).count() > size;

Upvotes: 0

Related Questions