Reputation: 87
ClickHouse in the write, whether it can be based on more than one column as Key, if it is a duplicate Key data inserted when replacing the existing data of the same Key, to ensure that the same Key, only one line of data. At the moment I know the solution is to do it through the ReplaceMergeTree engine, but this way is an asynchronous backend to keep merging, I wish there was a synchronous solution.
I also don't want to do it in a query with argMax or final, because I want to do replacement de-duplication at write time for the purpose of removing argMax from the SQL query.
Upvotes: 0
Views: 3769
Reputation: 3335
You can look into the ReplacingMergeTree engine which will update/replace records if the sort keys are identical.
https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree
CREATE TABLE myFirstReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime
)
ENGINE = ReplacingMergeTree
ORDER BY key;
INSERT INTO myFirstReplacingMT Values (1, 'first', '2020-01-01 01:01:01');
INSERT INTO myFirstReplacingMT Values (1, 'second', '2020-01-01 00:00:00');
SELECT * FROM myFirstReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ second │ 2020-01-01 00:00:00 │
└─────┴─────────┴─────────────────────┘
The FINAL
keyword is important here as this forces the replacement to take place, in advance of returning the result. Without this instruction the replacement would eventually take place but on an undeterminable schedule. FINAL
only needs to be called once.
Upvotes: 0
Reputation: 13310
It's impossible. CH is not MySQL.
https://github.com/ClickHouse/ClickHouse/issues/5125
Upvotes: 1