Reputation: 81
My kafka version is 0.10.2.1. My service have really low qps (1msg/sec). And our requirement for rtt is really strict. ( 99.9% < 30ms) Currently I've encounter a problem, when kafka run for a long time, 15 days or so, performance start to go down. 2017-10-21 was like
Time . num of msgs . percentage
cost<=2ms 0 0.000%
2ms<cost<=5ms 12391 32.659%
5ms<cost<=8ms 25327 66.754%
8ms<cost<=10ms 186 0.490%
10ms<cost<=15ms 24 0.063%
15ms<cost<=20ms 2 0.005%
20ms<cost<=30ms 0 0.000%
30ms<cost<=50ms 4 0.011%
50ms<cost<=100ms 1 0.003%
100ms<cost<=200ms 0 0.000%
200ms< cost<=300ms 6 0.016%
300ms<cost<=500ms 0 0.000%
500ms<cost<=1s 0 0.000%
cost>1s 0 0.000%
But recently, it became :
cost<=2ms 0 0.000%
2ms<cost<=5ms 7592 29.202%
5ms<cost<=8ms 17470 67.197%
8ms<cost<=10ms 698 2.685%
10ms<cost<=15ms 143 0.550%
15ms<cost<=20ms 23 0.088%
20ms<cost<=30ms 19 0.073%
30ms<cost<=50ms 11 0.042%
50ms<cost<=100ms 5 0.019%
100ms<cost<=200ms 11 0.042%
200m s<cost<=300ms 26 0.100%
300ms<cost<=500ms 0 0.000%
500ms<cost<=1s 0 0.000%
cost>1s 0 0.000%
When I check the log, I don't see a way to check the reason why a specific message have a high rtt. And if there's any way to optimize(OS tune, broker config), please enlighten me
Upvotes: 2
Views: 2783
Reputation: 489
Without the request handling time break-down it is hard to tell which part maybe the culprit of your issue. More specifically you'll need to hook up your jmx and check the following request-level metrics:
TotalTimeMs RequestQueueTimeMs LocalTimeMs RemoteTimeMs ResponseQueueTimeMs ResponseSendTimeMs
https://kafka.apache.org/documentation/#monitoring
Check their avg / 99 percentile value over time and see which one is contributing to the perf degradation.
Upvotes: 2
Reputation: 32130
Consider upgrading to 0.11 (or 1.00) which has performance improvements in it
Optimisation article: https://www.confluent.io/blog/optimizing-apache-kafka-deployment/
Upvotes: 0