Reputation: 53
I have a question about distributed tables in clickhouse. Let's say I have two nodes with clickhouse. Each node have datatable with ReplacingMergeTree engine (I know that it's not guarantee full deduplication and I'm ok with that) in which data goes from kafka through kafka engine table (each node read from own topic). And on each node created datatable_distributed table. Now, for some reason, in each kafka topic goes the absolutly same message. Am I correctly understand, that in the end of day, making query to distributed_table I will see two rows with that message simply because distributed just read from two datatables on different clusters and there is no deduplicating?
Upvotes: 0
Views: 2969
Reputation: 13310
Yes. There is no Replacing(merges) across nodes. You should use sharding key and place records with the same primary key to one node. For example you can insert into Distributed egnine (from Kafka using MaterializedView) and set some sharding expression based on primary key (not rand()).
Upvotes: 1