Jeffrey Bosboom
Jeffrey Bosboom

Reputation: 13653

How do I lazily concatenate streams?

I'm trying to implement a stream that uses another instance of itself in its implementation. The stream has a few constant elements prepended (with IntStream.concat) to it, so this should work as long as the concatenated stream creates the non-constant part lazily. I think using the StreamSupport.intStream overload taking a Supplier with IntStream.concat (which "creates a lazily concatenated stream") should be lazy enough to only create the second spliterator when elements are demanded from it, but even creating the stream (not evaluating it) overflows the stack. How can I lazily concatenate streams?


I'm attempting to port the streaming prime number sieve from this answer into Java. This sieve uses another instance of itself (ps = postponed_sieve() in the Python code). If I break the initial four constant elements (yield 2; yield 3; yield 5; yield 7;) into their own stream, it's easy to implement the generator as a spliterator:

/**
 * based on https://stackoverflow.com/a/10733621/3614835
 */
static class PrimeSpliterator extends Spliterators.AbstractIntSpliterator {
    private static final int CHARACTERISTICS = Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED | Spliterator.SORTED;
    private final Map<Integer, Supplier<IntStream>> sieve = new HashMap<>();
    private final PrimitiveIterator.OfInt postponedSieve = primes().iterator();
    private int p, q, c = 9;
    private Supplier<IntStream> s;
    PrimeSpliterator() {
        super(105097564 /* according to Wolfram Alpha */ - 4 /* in prefix */,
                CHARACTERISTICS);
        //p = next(ps) and next(ps) (that's Pythonic?)
        postponedSieve.nextInt();
        this.p = postponedSieve.nextInt();
        this.q = p*p;
    }

    @Override
    public boolean tryAdvance(IntConsumer action) {
        for (; c > 0 /* overflow */; c += 2) {
            Supplier<IntStream> maybeS = sieve.remove(c);
            if (maybeS != null)
                s = maybeS;
            else if (c < q) {
                action.accept(c);
                return true; //continue
            } else {
                s = () -> IntStream.iterate(q+2*p, x -> x + 2*p);
                p = postponedSieve.nextInt();
                q = p*p;
            }
            int m = s.get().filter(x -> !sieve.containsKey(x)).findFirst().getAsInt();
            sieve.put(m, s);
        }
        return false;
    }
}

My first attempt at the primes() method returns an IntStream concatenating a constant stream with a new PrimeSpliterator:

public static IntStream primes() {
    return IntStream.concat(IntStream.of(2, 3, 5, 7),
            StreamSupport.intStream(new PrimeSpliterator()));
}

Calling primes() results in a StackOverflowError because primes() always instantiates a PrimeSpliterator, but PrimeSpliterator's field initializer always calls primes(). However, there's an overload of StreamSupport.intStream that takes a Supplier, which should allow lazily creating the PrimeSpliterator:

public static IntStream primes() {
    return IntStream.concat(IntStream.of(2, 3, 5, 7),
            StreamSupport.intStream(PrimeSpliterator::new, PrimeSpliterator.CHARACTERISTICS, false));
}

However, I instead get a StackOverflowError with a different backtrace (trimmed, as it repeats). Note that the recursion is entirely in the call to primes() -- the terminal operation iterator() is never invoked on a returned stream.

Exception in thread "main" java.lang.StackOverflowError
    at java.util.stream.StreamSpliterators$DelegatingSpliterator$OfInt.<init>(StreamSpliterators.java:582)
    at java.util.stream.IntPipeline.lazySpliterator(IntPipeline.java:155)
    at java.util.stream.IntPipeline$Head.lazySpliterator(IntPipeline.java:514)
    at java.util.stream.AbstractPipeline.spliterator(AbstractPipeline.java:352)
    at java.util.stream.IntPipeline.spliterator(IntPipeline.java:181)
    at java.util.stream.IntStream.concat(IntStream.java:851)
    at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
    at com.jeffreybosboom.projecteuler.util.Primes$PrimeSpliterator.<init>(Primes.java:32)
    at com.jeffreybosboom.projecteuler.util.Primes$$Lambda$1/834600351.get(Unknown Source)
    at java.util.stream.StreamSpliterators$DelegatingSpliterator.get(StreamSpliterators.java:513)
    at java.util.stream.StreamSpliterators$DelegatingSpliterator.estimateSize(StreamSpliterators.java:536)
    at java.util.stream.Streams$ConcatSpliterator.<init>(Streams.java:713)
    at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:789)
    at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:785)
    at java.util.stream.Streams$ConcatSpliterator$OfInt.<init>(Streams.java:819)
    at java.util.stream.IntStream.concat(IntStream.java:851)
    at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
    at com.jeffreybosboom.projecteuler.util.Primes$PrimeSpliterator.<init>(Primes.java:32)
    at com.jeffreybosboom.projecteuler.util.Primes$$Lambda$1/834600351.get(Unknown Source)
    at java.util.stream.StreamSpliterators$DelegatingSpliterator.get(StreamSpliterators.java:513)
    at java.util.stream.StreamSpliterators$DelegatingSpliterator.estimateSize(StreamSpliterators.java:536)
    at java.util.stream.Streams$ConcatSpliterator.<init>(Streams.java:713)
    at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:789)
    at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:785)
    at java.util.stream.Streams$ConcatSpliterator$OfInt.<init>(Streams.java:819)
    at java.util.stream.IntStream.concat(IntStream.java:851)
    at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)

How can I concatenate streams lazily enough to allow a stream to use another copy of itself in its implementation?

Upvotes: 8

Views: 2281

Answers (3)

ds1
ds1

Reputation: 11

@selalerercapitolis's answer was very helpful for my use case, which has nothing to do with mutation: to get elements from the concatenated stream until one of the elements doesn't have a certain property.

Example:

Stream<String> concatenated = Stream.of(supplierA, supplierB).flatMap(Supplier::get); // selalerercapitolis's answer.
List<String> result = concatenated.takeWhile(s -> s.length() > 5).toList();

supplierB generates a stream that is a bit expensive to create, so I was avoiding creating it if one of the elements generated by supplierA fails the condition tested by takeWhile(). That is, if result will only contain elements generated by supplierA.

Upvotes: 0

selalerercapitolis
selalerercapitolis

Reputation: 56

I like to use Supplier to do that:

return Stream.<Supplier<Stream<WhatEver>>of(
  () -> generateStreamOfWhatEverAndChangeSomeState(input, state),
  () -> generateStreamOfMoreWhatEversDependendingOnMutatedState(state)
).flatMap(Supplier::get);

Since stream is lazily evaluated the generateStreamOfWhatEverAndChangeSomeState() will finish before generateStreamOfMoreWhatEversDependendingOnMutatedState() start and the state would be updated.

I should note that this is probably not what the designers of Stream had in mind. Idealy a Stream should not change state, only read each item and produce a new item.

Upvotes: 1

Marko Topolnik
Marko Topolnik

Reputation: 200148

Your apparently assume that the Streams API extends its guarantees of laziness even to the instantiation of spliterators; this is not correct. It expects to be able to instantiate the stream's spliterator at any time before the actual consumption begins, for example just to find out the stream's characteristics and reported size. Consumption only begins by invoking trySplit, tryAdvance, or forEachRemaining.

Having that in mind, you are initializing the postponed sieve earlier than you need it. You don't get to use any of its results until the else if part in tryAdvance. So move the code to the last possible moment which gives correctness:

@Override
public boolean tryAdvance(IntConsumer action) {
    for (; c > 0 /* overflow */; c += 2) {
        Supplier<IntStream> maybeS = sieve.remove(c);
        if (maybeS != null)
            s = maybeS;
        else {
            if (postponedSieve == null) {
              postponedSieve = primes().iterator();
              postponedSieve.nextInt();
              this.p = postponedSieve.nextInt();
              this.q = p*p;
            }
            if (c < q) {
              action.accept(c);
              return true; //continue

I think that, with this change, even your first attempt at primes() should work.

If you want to stay with your current approach, you could involve the following idiom:

Stream.<Supplier<IntStream>>of(
  ()->IntStream.of(2, 3, 5, 7),
  ()->intStream(new PrimeSpliterator()))
.flatMap(Supplier::get);

You may find that this gives you as much laziness as you need.

Upvotes: 11

Related Questions