keyzj
keyzj

Reputation: 341

Can't get data from Kafka to distributed table

I need to get data from Kafka queue (filled it with my script) to every replica in ClickHouse (CH) cluster.

I've created:

  1. 'queue' table (Kafka engine) on every replica;
  2. 'consumer' materialized view (get data from 'queue' to distributed table) on every replica;
  3. 'data' distributed table;

While I'm putting data into Kafka i pretty sure that tables accept data (simple select count(*) from data), but i always get this:

"Progress: 1.55 thousand rows, 1.24 MB (297.46 rows/s., 237.18 KB/s.) Received exception from server (version 18.14.17): Code: 159. DB::Exception: Received from host:port. DB::Exception: Failed to claim consumer: . 0 rows in set. Elapsed: 5.313 sec. Processed 1.55 thousand rows, 1.24 MB (291.94 rows/s., 232.78 KB/s.)"

When i stop filling Kafka i have a short time window at which i can complete my query. But after a few seconds i receive - 0 counts on every table i have created.

Upvotes: 0

Views: 975

Answers (2)

Andrew
Andrew

Reputation: 36

While the approach shared by Keyzj works and is useful when you want to use the distributed table to control sharding, there's another approach that improves throughput and reliability.

You can create Kafka tables on all machines in your cluster with separate materialized views feeding the local table on each host. So long as the same consumer group name is used in the Kafka table definition, Kafka internals will ensure that each host is consuming from unique partitions. You have to make sure the number of Kafka partitions is 2-3x the number of hosts. For instance, if you have 4 hosts in your cluster, each host will be setup like this:

  • Kafka table with num_consumers = 1
  • Local merge tree table
  • Materialized view feeding the local table from the Kafka table

If this approach isn't giving you the throughput you need, you can introduce additional Kafka tables and materialized views on the same host. The key is using the same consumer group name throughout.

Upvotes: 2

keyzj
keyzj

Reputation: 341

Problem was on my side: invalid columns in materialized view 'consumer'. Btw, if anyone will need to do the same task here's data map:

  1. Create 'local' tables on all hosts in cluster;
  2. Create distributed tables on all hosts in cluster;
  3. Create Kafka engine table 'queue' + materialized view 'consumer' on one host

Upvotes: 2

Related Questions