Reputation: 81

Kafka latency optimization

My kafka version is 0.10.2.1. My service have really low qps (1msg/sec). And our requirement for rtt is really strict. ( 99.9% < 30ms) Currently I've encounter a problem, when kafka run for a long time, 15 days or so, performance start to go down. 2017-10-21 was like

Time .                num of msgs .  percentage
cost<=2ms             0             0.000%

2ms<cost<=5ms         12391         32.659%

5ms<cost<=8ms         25327         66.754%

8ms<cost<=10ms        186           0.490%

10ms<cost<=15ms       24            0.063%

15ms<cost<=20ms       2             0.005%

20ms<cost<=30ms       0             0.000%

30ms<cost<=50ms       4             0.011%

50ms<cost<=100ms      1             0.003%

100ms<cost<=200ms     0             0.000%

200ms< cost<=300ms    6             0.016%

300ms<cost<=500ms     0             0.000%

500ms<cost<=1s        0             0.000%

cost>1s               0             0.000%

But recently, it became :

cost<=2ms            0              0.000%

2ms<cost<=5ms        7592           29.202%

5ms<cost<=8ms        17470          67.197%

8ms<cost<=10ms       698            2.685%

10ms<cost<=15ms      143            0.550%

15ms<cost<=20ms      23             0.088%

20ms<cost<=30ms      19             0.073%

30ms<cost<=50ms      11             0.042%

50ms<cost<=100ms     5              0.019%

100ms<cost<=200ms    11            0.042%

200m s<cost<=300ms   26             0.100%

300ms<cost<=500ms    0              0.000%

500ms<cost<=1s       0             0.000%

cost>1s              0              0.000%

When I check the log, I don't see a way to check the reason why a specific message have a high rtt. And if there's any way to optimize(OS tune, broker config), please enlighten me

Upvotes: 2

Answers (2)

Guozhang Wang

Reputation: 489

Without the request handling time break-down it is hard to tell which part maybe the culprit of your issue. More specifically you'll need to hook up your jmx and check the following request-level metrics:

TotalTimeMs RequestQueueTimeMs LocalTimeMs RemoteTimeMs ResponseQueueTimeMs ResponseSendTimeMs

https://kafka.apache.org/documentation/#monitoring

Check their avg / 99 percentile value over time and see which one is contributing to the perf degradation.

Upvotes: 2

Robin Moffatt

Reputation: 32130

Consider upgrading to 0.11 (or 1.00) which has performance improvements in it
Optimisation article: https://www.confluent.io/blog/optimizing-apache-kafka-deployment/

Upvotes: 0

Kafka latency optimization

Answers (2)

Related Questions