Reputation: 41
On 100Gb network, I create a server to listening on 4 ports and grpc client can reach 3GB+/s throughput. However, when the server listening on one port, grpc client reaches just 1GB/s throughput, even I set
args.SetInt(GRPC_ARG_HTTP2_STREAM_LOOKAHEAD_BYTES, 1024*1024*1024);
args.SetInt(GRPC_ARG_MAX_CONCURRENT_STREAMS, 10);
It seems that grpc client can use only one connection concurrently to one port service. Am I right?
What's the correct way of doing it?
My code is here:
client: https://github.com/gongweibao/tests/blob/develop/grpc_test/client.cc
server: https://github.com/gongweibao/tests/blob/develop/grpc_test/server.cc
Upvotes: 4
Views: 2919
Reputation: 268
It is hard to say exactly where your program is getting bottlenecked without more data (like flamegraphs and such).
Seeing as this occurs when you change gRPC server to listen on one port, I can make some guesses as to where the slowdown is. It looks like you request a server side call at the top of the proceed loop (1). I would advise for a different pattern; request some fixed number of calls (in the 100s), then at the end of the handler loop, re-request a call, so that the server is alway "armed" to recv many incoming RPCs.
And example of this pattern can be found in our QPS driver code (finely tuned, highly optimized benchmarking application) (2).
TF also does it this way (3).
Also, just some small spot checks as I read over your code. There are some places that you might consider tuning to get better numbers. For example, you might want to only alloc once here (4), to avoid benchmarking the repeated malloc calls per-RPC. Also, why do you do custom serialization to bytebuffer here (5)? That might miss out on proto specific optimizations.
Upvotes: 5