Reputation: 118
I have Kafka cluster on 3 servers. There is a topic with one partition and 3 replicas. Average message is about 200 Bytes.
I want multiple consumers (i.e. with different group IDs) to read from the topic, so each consumer receives all the data.
The problem is that each new consumer is slower than the previous, so after adding about 20 consumers new consumers are very slow.
The following table shows the problem:
topic consumer current offset
topic-0 group-1 4191232
topic-0 group-4 3860979
topic-0 group-2 3799224
topic-0 group-12 2112518
topic-0 group-7 1984491
topic-0 group-3 1842349
topic-0 group-6 1695504
topic-0 group-11 1388133
topic-0 group-5 1383794
topic-0 group-19 1242424
topic-0 group-16 941960
topic-0 group-14 876551
topic-0 group-22 837359
topic-0 group-21 828698
topic-0 group-13 811273
topic-0 group-26 716414
topic-0 group-9 699175
topic-0 group-18 621772
topic-0 group-15 617520
topic-0 group-17 613233
topic-0 group-10 388891
topic-0 group-8 328258
topic-0 group-24 233805
topic-0 group-29 131299
topic-0 group-23 84658
topic-0 group-20 80492
topic-0 group-27 63527
topic-0 group-25 50720
topic-0 group-28 46474
topic-0 group-30 37958
These consumers were started almost at the same time, and this state was captured after about 20 seconds. group-1 read 4.19 million records, and group-30 read only 37958 records.
Consumers distribution differs from run to run, but always there are slow consumers.
I've tried to run consumers on dedicated servers, and locally on Kafka cluster - situation didn't change.
Log messages on slow consumers show that round-trip-time is high, sometimes is more than a second
kafka3:9092/3: Sent FetchRequest (v4, 93 bytes @ 0, CorrId 36322)
kafka3:9092/3: Received FetchResponse (v4, 1048636 bytes, CorrId 36322, rtt 747.24ms)
This problem is reproducible with kafka console consumer and librdkafka, so I think something wrong with brokers.
I've set num.io.threads and num.network.threads parameters in broker configs to 32, it didn't help. Other parameters are default.
Any help will be appreciated.
UPDATE 1
Log message for slow consumer on broker shows that problem is definitely at broker side:
[2018-03-07 12:58:42,787] DEBUG Completed request:RequestHeader(apiKey=OFFSET_COMMIT, apiVersion=1, clientId=rdkafka, correlationId=376) -- {group_id=group-12,generation_id=13,member_id=rdkafka-5c08ffd4,topics=[{topic=test-topic,partitions=[{partition=0,offset=651909,timestamp=-1,metadata=}]}]},response:{responses=[{topic=test-topic,partition_responses=[{partition=0,error_code=0}]}]} from connection kafka3:9092-client12:37884-10;totalTime:1547.433,requestQueueTime:0.104,localTime:0.631,remoteTime:1546.48,throttleTime:0.019,responseQueueTime:0.046,sendTime:0.15,securityProtocol:PLAINTEXT,principal:User:ANONYMOUS,listener:PLAINTEXT (kafka.request.logger)
remoteTime is 1.5 seconds
So question here is where should I look on the broker side to resolve problem?
Upvotes: 2
Views: 2134
Reputation: 118
The problem is that consumers occupy all available network on the broker server.
Kafka probably sends responses to consumers in some determined order (by connection time as far as I can see). So we got a few very fast consumers, and bunch of consumers with reasonable speed. Other consumers are slow and only disconnection of "fast" consumers may help them.
Upvotes: 1