Reputation: 23
I am trying to upgrade the clickhouse cluster from version 18.8 to 19.9.2. Previously, I had a cronjob that deletes old data from the database. I want to start using TTL feature instead.
Simplified table definition:
CREATE TABLE myTimeseries(
timestamp_ns Int64,
source_id String,
data String,
date Date MATERIALIZED toDate(timestamp_ns/1e9),
time DateTime MATERIALIZED toDateTime(timestamp_ns/1e9))
ENGINE = MergeTree()
PARTITION BY (source_id, toStartOfHour(time))
TTL date + toInterValDay(7)
SETTINGS index_granularity=8192, merge_with_ttl_timeout=43200
The problem is, it does not delete old data. I could not find anything in the documentation that would help debug this issue.
Questions:
How can I debug this issue? (Is there a way to see when the data will be cleared in the future)?
Might this be because of date field being materialized? I have another table where date is not a materialized field and everything works fine.
Upvotes: 2
Views: 2869
Reputation: 56
Yes, you can use materialized fields with TTL feature. I've attached simple query that create table with 5 minutes interval to delete. It works fine with clickhouse server version 20.4.5
CREATE TABLE IF NOT EXISTS test.profiling
(
headtime UInt64,
date DateTime MATERIALIZED toDateTime(headtime),
id Int64,
operation_name String,
duration Int64
)
ENGINE MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, id)
TTL date + INTERVAL 5 MINUTE
And important note from clickhouse documentation:
Data with an expired TTL is removed when ClickHouse merges data parts.
When ClickHouse see that data is expired, it performs an off-schedule merge. To control the frequency of such merges, you can set merge_with_ttl_timeout. If the value is too low, it will perform many off-schedule merges that may consume a lot of resources.
If you perform the SELECT query between merges, you may get expired data. To avoid it, use the OPTIMIZE query before SELECT.
Upvotes: 2