mgencel
mgencel

Reputation: 23

ClickHouse TTL on materialized column

I am trying to upgrade the clickhouse cluster from version 18.8 to 19.9.2. Previously, I had a cronjob that deletes old data from the database. I want to start using TTL feature instead.

Simplified table definition:

    CREATE TABLE myTimeseries(
                   timestamp_ns Int64,
                   source_id String,
                   data String,
                   date Date MATERIALIZED toDate(timestamp_ns/1e9),
                   time DateTime MATERIALIZED toDateTime(timestamp_ns/1e9)) 
    ENGINE = MergeTree()
    PARTITION BY (source_id, toStartOfHour(time))
    TTL date + toInterValDay(7)
    SETTINGS index_granularity=8192, merge_with_ttl_timeout=43200

The problem is, it does not delete old data. I could not find anything in the documentation that would help debug this issue.

Questions:

  1. How can I debug this issue? (Is there a way to see when the data will be cleared in the future)?

  2. Might this be because of date field being materialized? I have another table where date is not a materialized field and everything works fine.

Upvotes: 2

Views: 2869

Answers (1)

D.Sporyagin
D.Sporyagin

Reputation: 56

Yes, you can use materialized fields with TTL feature. I've attached simple query that create table with 5 minutes interval to delete. It works fine with clickhouse server version 20.4.5

CREATE TABLE IF NOT EXISTS test.profiling
(
    headtime UInt64, 
    date DateTime MATERIALIZED toDateTime(headtime),  
    id Int64,  
    operation_name String,
    duration Int64
)   
ENGINE MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, id)
TTL date + INTERVAL 5 MINUTE

And important note from clickhouse documentation:

Data with an expired TTL is removed when ClickHouse merges data parts.

When ClickHouse see that data is expired, it performs an off-schedule merge. To control the frequency of such merges, you can set merge_with_ttl_timeout. If the value is too low, it will perform many off-schedule merges that may consume a lot of resources.

If you perform the SELECT query between merges, you may get expired data. To avoid it, use the OPTIMIZE query before SELECT.

Upvotes: 2

Related Questions