amro4934
amro4934

Reputation: 67

GRPC requests getting cancelled by client, despite infinite timeouts everywhere

I'm doing some tests with a grpc client/server. The code is based on the official python examples https://github.com/grpc/grpc/blob/master/examples/python/helloworld/async_greeter_server.py https://github.com/grpc/grpc/blob/master/examples/python/helloworld/async_greeter_client.py

My server benefits a lot from aggregated requests, so I try to aggregate a lot of requests together which makes the payload quite heavy, and treatment times around 20 seconds.

I'm doing some load tests with 1 server (local) and 20 clients. All calls are asynchronous using asyncio.

The first requests go through, but after a minute or so, everything fails with:

    Received error: %s <AioRpcError of RPC that terminated with:
        status = StatusCode.CANCELLED
        details = "CANCELLED"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"CANCELLED", grpc_status:1,     created_time:"2024-07-22T10:29:05.205259898+02:00"}"

I looked everywhere and can't understand where this is coming from. Why does the request get cancelled and by whom?

For instance, adding this didn't solve anything:

    server_options = [
            ("grpc.keepalive_time_ms", 99999999),
            ("grpc.keepalive_timeout_ms", 9999999),
            ("grpc.http2.min_ping_interval_without_data_ms", 9999999),
            ("grpc.max_connection_idle_ms", 99999999),
            ("grpc.max_connection_age_ms", 99999999),
            ("grpc.max_connection_age_grace_ms", 99999999),
            ("grpc.http2.max_pings_without_data", 5),
            ("grpc.keepalive_permit_without_calls", 1),
        ]
    client_options = [
        ('grpc.max_send_message_length', 50 * 1024 * 1024),  # 50 MB
        ('grpc.max_receive_message_length', 50 * 1024 * 1024),  # 50 MB
        ("grpc.keepalive_time_ms", 999999999),
        ("grpc.keepalive_timeout_ms", 99999999),
        ("grpc.http2.max_pings_without_data", 5),
        ("grpc.keepalive_permit_without_calls", 1)
    ]

I know this is bad in practice, and will definetely not be doing this in production, but I need to understand why this error happens to prevent it in the future.

Upvotes: 0

Views: 301

Answers (0)

Related Questions