spl
spl

Reputation: 651

Spring WebFlux BlockingIterator.hasNext() blocking forever

I am trying to track down a bug in my app with Spring Webflux hanging under load. I have been debugging this for many hours so I really need some help, even just on direction to investigate.

Problem:

myFlux = client.get()
    .uri(uriBuilder.get())
    .header(...)
    ...
    .exchangeToFlux(clientResponse -> {
        // returns .bodyToFlux(ServerSentEvent<String>) or Flux.error(new Ex) depending on status code
    })
    .doOnError(t -> log.error(t))
    .onErrorMap(Error.class, e -> {    // *1, below
        log.error(e);
        return new RuntimeException("Wrapped Error in RuntimeException", e)
    })
    ;

// later ...

Iterator<ServerSentEvent<String>> iterator = myFlux.toIterable().iterator(); // Returns a BlockingIterable

while(iterator.hasNext()) { // hangs here
    ServerSentEvent<String> line = iterator.next();
    ...
}

Basically, my iterator (Which wraps the flux) is happily returning all the lines from my http response for about 3 hours under load test, then at some point iterator.hasNext blocks, presumably meaning that the flux is expecting more lines from the webclient but doesn't have them yet. Unfortunately it never unblocks and the worker thread is kaput.

I tried putting a read timeout on the underlying HTTP Client in case it was just hanging, holding the flux and iterator in a waiting-for-more state, but to no avail.

I then read this question, suggesting a VM error:

https://github.com/reactor/reactor-core/issues/3036

And the symptom sounds like mine, so I tried //*1 above to catch any VM errors under java.lang.Error that the flux might be concealing that could cause it to become corrupt: "Reactor doesn't attempt to recover from a VirtualMachineError, of which OutOfMemoryError is a subclass. When such _Error_s occur, there is no telling in which state the JVM is and whether or not all components of the application will be able to recover. Generally, they won't."

However the symptoms remain unchanged, and also I note I am on reactor-core-3.4.25 so the fix to surface the error mentioned in the related ticket (https://github.com/reactor/reactor-core/issues/3111) should be in there anyway to log the error.

So I'm trying to decide if I'm getting a fatal JVM error under java.lang.Error, and if so, why is it not coming out anywhere in the logs, or if there may be a different reason / avenue to investigate.

Upvotes: 2

Views: 36

Answers (0)

Related Questions