shizhz
shizhz

Reputation: 12521

ISTIO sidecar causes Java grpc client throws "UNAVAILABLE: upstream connect error or disconnect/reset before headers" under high concurrency load

I have two gRPC services and one will call another one through normal gRPC method(no stream on either side), I'm using istio as service mesh and have sidecar injected into kubernetes pod of both services.

The gRPC call works correctly under normal load, but under high concurrency load situations, gRPC client side keeps throwing the following exception:

<#bef7313d> i.g.StatusRuntimeException: UNAVAILABLE: upstream connect error or disconnect/reset before headers
    at io.grpc.Status.asRuntimeException(Status.java:526)
    at i.g.s.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
    at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at i.g.i.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
    at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at i.g.i.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
    at i.g.i.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
    at i.g.i.ClientCallImpl.access$300(ClientCallImpl.java:63)
    at i.g.i.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
    at i.g.i.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
    at i.g.i.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
    at i.g.i.ContextRunnable.run(ContextRunnable.java:37)
    at i.g.i.SerializingExecutor.run(SerializingExecutor.java:123)
    at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Meanwhile, there's no exception on the server side, and there's no error on the istio-proxy container of client's pod neither. But if I disable istio sidecar injection so that those two service talk to each other directly, there's no such errors.

Could somebody kindly tell me why, and how to resolve this problem?

Thanks a lot.

Upvotes: 2

Views: 3399

Answers (1)

shizhz
shizhz

Reputation: 12521

Finally I found the reason, it's caused by the default circuitBeakers settings of envoy sidecar, by default the option max_pending_requests and max_requests is set to 1024, and the default connecTimeout is 1s, so under the high concurrency load situation when the server side has too many pending requests waiting to be served, the sidecar circuitBreaker will get involved and tell client side the server side upstream is UNAVAILABLE.

To fix this problem you need to apply a DestinationRule for the target service with reasonable trafficPolicy settings.

Upvotes: 6

Related Questions