Porthos3
Porthos3

Reputation: 401

Java Streams - Filtering on previously filtered values

I am experimenting with Java's Streams and trying to figure out what is possible as well as their strengths and weaknesses. Currently I am trying to implement the Sieve of Eratosthenes using a stream, but cannot seem to find a good way to loop through previously filtered values without storing them in a separate collection.

I am wanting to accomplish something like this:

IntStream myStream = IntStream.range(0,3);
myStream.filter(s -> {
    System.out.print("[filtering "+s+"] ");
    myStream.forEach(q -> System.out.print(q+", "));
    System.out.println();
    return true; //eventually respond to values observed on the line above
});

With a desired output of:

[filtering 0] 
[filtering 1] 0, 
[filtering 2] 0, 1, 
[filtering 3] 0, 1, 2, 

Note that while filtering each new value all previously filtered values are observed. This would allow an easy implementation of the Sieve of Eratosthenes because I could filter out all non-prime values and for each new value check for divisibility against all numbers that have previously passed the prime filter.

However, the above example gives me an error in NetBeans:

local variables referenced from a lambda expression must be final or effectively final

This appears to be because I am referencing myStream within a filter that is already acting on myStream. Is there any good way of working around this error (ie. making a final copy of the stream containing only the values that have been filtered so far), or is there a better approach to this sort of problem without using a separate collection to store values?

Upvotes: 7

Views: 2561

Answers (4)

a better oliver
a better oliver

Reputation: 26828

It's debatable if a stream is the right tool here, but .filter() definitely isn't. Filters are supposed to be stateless, so the idea shouldn't come up in the first place. Based on the example in your answer a collector might be a feasible solution.

List<Integer> primes = IntStream.range(2, UPPER_BOUND)
  .collect(ArrayList::new,
          (list, number) -> { 
                for(int j=0; j < list.size(); j++) {
                    int prime = list.get(j);

                    if(prime > Math.sqrt(number)) {
                        break;
                    }

                    if(number % prime == 0) {
                        return;
                    }
                }

                list.add(number);
          },
          List::addAll);

ArrayList::new creates a new list which is then referenced by the consumer as list. The consumer is called for every element in the stream with number being the element.

List::addAll would only be relevant for parallel streams which can't be used for this algorithm anyway.

Upvotes: 1

Helder Pereira
Helder Pereira

Reputation: 5756

I managed to create an infinite Stream of prime numbers using the Sieve of Eratosthenes, but it actually does not use past values. Instead, it removes the multiples of a prime in the tail (in a lazy way, because the tail is infinite), like the original Sieve of Eratosthenes algorithm. For that, I used an Iterator as auxiliary (because the Stream can only be used once) and implemented a lazyConcat for streams.

class StreamUtils {
    public static IntStream fromIterator(PrimitiveIterator.OfInt it) {
        return StreamSupport.intStream(
                Spliterators.spliteratorUnknownSize(it, Spliterator.ORDERED), false);
    }

    public static IntStream lazyConcat(Supplier<IntStream> a, Supplier<IntStream> b) {
        return StreamSupport.intStream(new Spliterator.OfInt() {
            boolean beforeSplit = true;
            Spliterator.OfInt spliterator;

            @Override
            public OfInt trySplit() {
                return null;
            }

            @Override
            public long estimateSize() {
                return Long.MAX_VALUE;
            }

            @Override
            public int characteristics() {
                return Spliterator.ORDERED;
            }

            @Override
            public boolean tryAdvance(IntConsumer action) {
                boolean hasNext;
                if (spliterator == null) {
                    spliterator = a.get().spliterator();
                }
                hasNext = spliterator.tryAdvance(action);
                if (!hasNext && beforeSplit) {
                    beforeSplit = false;
                    spliterator = b.get().spliterator();
                    hasNext = spliterator.tryAdvance(action);
                }
                return hasNext;
            }
        }, false);
    }
}

My Sieve of Eratosthenes stream looks like this:

class Primes {
    public static IntStream stream() {
        return sieve(IntStream.iterate(2, n -> n + 1));
    }

    private static IntStream sieve(IntStream s) {
        PrimitiveIterator.OfInt it = s.iterator();
        int head = it.nextInt();
        IntStream tail = StreamUtils.fromIterator(it);
        return StreamUtils.lazyConcat(
                () -> IntStream.of(head),
                () -> sieve(tail.filter(n -> n % head != 0)));
    }
}

Then we can use it this way:

System.out.println(Primes.stream().limit(20).boxed().collect(Collectors.toList()));

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

I think it was a good exercise, but it seems it is quite inefficient and not stack-friendly at all.

Upvotes: 3

Porthos3
Porthos3

Reputation: 401

Other answers have suggested that the approach I had been trying is not possible, and that a separate collection must be used.

To provide a more complete answer, I wanted to provide a valid approach to this problem using streams and compare it against a more traditional approach.

Listing primes using streams (using the Sieve of Eratosthenes):

List<Integer> primes = new ArrayList<Integer>();

IntStream.iterate(2, i -> i + 1)
    .limit(UPPER_BOUND)
    .filter(i -> {
        for(int j=0; j<primes.size(); j++) {
            int prime = primes.get(j);

            if(prime > Math.sqrt(i)) {
                break;
            }

            if(i % prime == 0) {
                return false;
            }
        }
        return true;
    })
    .forEach(primes::add);

Traditional, equivalent, approach without using streams:

List<Integer> primes = new ArrayList<Integer>();

for(int i=2; i < UPPER_BOUND; i++) {
    boolean isPrime = true;

    for(int j=0; j<primes.size(); j++) {
        int prime = primes.get(j);

        if(prime > Math.sqrt(i)) {
            break;
        }

        if(i % prime == 0) {
            isPrime = false;
            break;
        }
    }

    if(isPrime) {
        primes.add(i);
    }
}

Performance Comparison:

Some experimentation with each function consistently demonstrated that the traditional approach is actually faster than using streams in this case. The streams approach consistently took 1.5x longer to find all prime numbers under one million when compared to the traditional approach (average of 106ms and 70ms respectively on my machine).

This difference in performance could likely be easily made up if the stream's .parallel() function could allow easy parallelization of the problem. However, parallelization is not easy in this case because ArrayList is not thread-safe, and will quickly result in errors and/or inaccurate results.

Conclusion:

Assuming the other answers are correct, filtering already-filtered data within a filter on that same stream is not possible in Java.

Listing primes can be tackled using streams. However, pending a better solution than my own, it is currently better to stick with a traditional stream-less approach.

Upvotes: 0

Eran
Eran

Reputation: 393831

You can't process a Stream more than once, therefore calling myStream.forEach inside the filter method is not possible.

You could create a new IntStream inside the filter.

Note that you will have to add some terminal operation to the outer Stream pipeline in order for it to be processed :

IntStream myStream = IntStream.range(0,4);
myStream.filter(s -> {
    System.out.print("[filtering "+s+"] ");
    IntStream.range(0,s).forEach(q -> System.out.print(q+", "));
    System.out.println();
    return true; //eventually respond to values observed on the line above
}).forEach(i->{});

This produces :

[filtering 0] 
[filtering 1] 0, 
[filtering 2] 0, 1, 
[filtering 3] 0, 1, 2, 

Upvotes: 2

Related Questions