Kafka Streams Consumer Constantly Rebalance over 100k tps
We have a kafka streams service that performs a left join (KStreams) operation by their message key. The message size is 1 KB more or less.
The left topic has around two hundred thousand (200,000) messages per second, while the right topic has around two thousand (2000) message per second.
Each topic has 96 partitions with 3 hours retention time.
The join operation has 15 minutes time window.
We have a very concerning problem that the consumer keep getting rebalancing after some time and the consumer lag can accumulate to 200 millions or more.
We have doing a lot of tuning and test on our service but the problem still persist. The consumer start rebalancing when we have 100,000 tps or more.
Below are the latest configuration we use:
- auto.offset.reset = latest
- session.timeout.ms = 300000
- heartbeat.interval.ms = 75000 (1/4 of session.timeout.ms)
- fetch.max.wait.ms = 500
- fetch.min.bytes = 1048576
- fetch.max.bytes = 52428800
- max.partition.fetch.bytes = 1048576
- max.poll.interval.ms = 300000
- max.poll.records = 1
- request.timeout.ms = 120000
- default.api.timeout.ms = 60000
- cache.max.bytes.buffering = 104857600
- num.stream.threads = 48 (each pod)
We have also enable custom configuration on rocksdb configuration from the service side.
Below are the latest rocksdb configuration we use:
- writeBufferSize = 67108864 (64MB)
- maxWriteBufferNumber = 3
- minWriteBufferNumberToMerge = 1
- blockCacheSize = 134217728 (128MB)
- increaseParallelism = 5
- compactionStyle = level
- totalOffHeapMemory = 16000000000 (16GB)
- totalMemtableSize = 8000000000 (8GB)
- compressionStyle = lz4
- cacheIndexAndFilterBlocksWithHighPriority = true
- pinL0FilterAndIndexBlocksInCache = true
- pinTopLevelIndexAndFilter = true
- optimizeFiltersForHits = false
- optimizeFiltersForMemory = false
- lruStrictCapacityLimit = false
Below are some info regarding kafka cluster we are using:
- 10 brokers (8 core 2.3GHz, 16GB RAM)
- kafka version = 3.8.0
- kafka-streams library = 3.1.2
We have several bare metal server dedicated to deploy the services with @ 16 core and 32GB RAM.
The service running in a dockerized container with 2 pods (we have tried up to 4 pods too).
Could someone provide inside or solutions so we can have a stable and fast consumer to handle this kind of messages?
We have doing alot of configuration tuning, especially for the consumer configuration stated above, following a lot of performance tuning article in the web.
We have tried :
- setting session.timeout.ms starting from 30000 to 300000
- setting max.poll.interval.ms starting from 60000 to 300000
- setting max.poll record from 500, 250, 100, 50, 20, and 1
- request.timeout.ms starting from 30000 to 120000
- deployed service in 2, 3, and 4 pods, with 96 partitions in topic and num.stream.treads equal to [partitions / pods_number]
- setting totalOfHeapMemory from 4GB, 8GB, and 16GB and totalMemtableSize from 2GB, 4GB, and 8GB
- setting writeBufferSize from 32MB to 128MB
- setting true and false for cacheIndexAndFilterBlocksWithHighPriority, pinL0FilterAndIndexBlocksInCache, and pinTopLevelIndexAndFilter