Reputation: 11
Good afternoon, everyone. I'm setting up reading data from a Kafka topica that consists of 24 partitions using ClickHouse. ClickHouse uses a table from Kafka ENGINE with the following settings:
CREATE TABLE logs.log_kafka_v2 (
`_partition` UInt64,
`_offset` UInt64,
`user_id` String,
`content_id` String,
`scenario` String,
`node_context_key` String,
`is_result_node` String,
`content` String,
`score` String,
`position` Int8,
`request_id` String,
`response_ts` String
) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka-navigator-prod:443',
kafka_topic_list = 'log',
kafka_group_name = 'log-click-consumer-test',
kafka_format = 'JSONEachRow',
kafka_thread_per_consumer = 1,
kafka_num_consumers = 24,
kafka_poll_max_batch_size = 800000,
kafka_commit_every_batch = 1,
kafka_poll_timeout_ms = 2200;
At the moment this configuration allows to read data from 18 partitions with good speed, but not how many (6-8) have a very large data lag, as you can see from the screenshot.
My topic has about 400,000 - 500,000 posts per second. Has anyone encountered this situation ? What settings should be changed so that all partitions do not have lag ? Thanks in advance!
I've tried using different settings for the base and ClickHouse itself. The base itself is based in the cloud as SaaS, Kafka is on Kubernetes
Upvotes: 1
Views: 24