python grpc deadline exceeded errors in large percentages

Question

I'm getting a lot of deadline exceeded errors in python grpc client calling a scala grpc server.

I'm reporting metrics from both client as well as server and I have a large discrepency between server reported time vs client reported time which I don't think can be explained by network latency only (as the variance is big). The returned objects are of similar size, I would assume serialization time is negligable compared to network times.

I've set the timeout to 20ms

My client code is simple:

self.channel = grpc.insecure_channel(...)
self.stub = MyService_pb2_grpc.MyServiceStub(self.channel)
timeout = 0.02
try:
  start_ms = time.time()
  grpc_res = self.stub.getFoo(Request(...), timeout=timeout)
  end_ms = time.time()
  total_duration_ms = int((end_ms - start_ms) * 1000)
....
except Exception as e:
  status_code = str(e.code()).split('.')[1]
  logger.error('exception ....: %s', status_code) # around 20% deadline exceptions

My server code is reporting 5ms on average, the client code is reporting 7ms on average , but as mentioned , hitting 20% timeouts at 20ms

Is there a way to debug the root cause for this problem, i.e. lower level logging etc.?

python grpc deadline exceeded errors in large percentages

Answers (1)

Related Questions