Reputation: 341
I need to get data from Kafka queue (filled it with my script) to every replica in ClickHouse (CH) cluster.
I've created:
While I'm putting data into Kafka i pretty sure that tables accept data (simple select count(*) from data
), but i always get this:
"Progress: 1.55 thousand rows, 1.24 MB (297.46 rows/s., 237.18 KB/s.) Received exception from server (version 18.14.17): Code: 159. DB::Exception: Received from host:port. DB::Exception: Failed to claim consumer: . 0 rows in set. Elapsed: 5.313 sec. Processed 1.55 thousand rows, 1.24 MB (291.94 rows/s., 232.78 KB/s.)"
When i stop filling Kafka i have a short time window at which i can complete my query. But after a few seconds i receive - 0 counts on every table i have created.
Upvotes: 0
Views: 975
Reputation: 36
While the approach shared by Keyzj works and is useful when you want to use the distributed table to control sharding, there's another approach that improves throughput and reliability.
You can create Kafka tables on all machines in your cluster with separate materialized views feeding the local table on each host. So long as the same consumer group name is used in the Kafka table definition, Kafka internals will ensure that each host is consuming from unique partitions. You have to make sure the number of Kafka partitions is 2-3x the number of hosts. For instance, if you have 4 hosts in your cluster, each host will be setup like this:
If this approach isn't giving you the throughput you need, you can introduce additional Kafka tables and materialized views on the same host. The key is using the same consumer group name throughout.
Upvotes: 2
Reputation: 341
Problem was on my side: invalid columns in materialized view 'consumer'. Btw, if anyone will need to do the same task here's data map:
Upvotes: 2