Reputation: 3
I have Added a skip index on a field (status_id) in clickhouse. It is used and working when I use my query as below:
EXPLAIN INDEXES=1
SELECT COUNT(*) FROM mytable WHERE
created_date='2022-10-05' and status_id=123;
The result:
Expression ((Projection + Before ORDER BY))
Aggregating
Expression (Before GROUP BY)
Filter (WHERE)
SettingQuotaAndLimits (Set limits and quota after reading from storage)
ReadFromMergeTree
Indexes:
MinMax
Keys:
created_date
Condition: (created_date in [19270, 19270])
Parts: 3/138
Granules: 346/8840
Partition
Keys:
toYYYYMM(created_date)
Condition: (toYYYYMM(created_date) in [202210, 202210])
Parts: 3/3
Granules: 346/346
PrimaryKey
Condition: true
Parts: 3/3
Granules: 346/346
Skip
Name: status_index
Description: set GRANULARITY 1
Parts: 3/3
Granules: 14/346
But when I use the final
modifier, the skip index is not used anymore:
EXPLAIN INDEXES=1
SELECT COUNT(*) FROM mytable FINAL WHERE
created_date='2022-10-05' and status_id=123;
The result:
Expression ((Projection + Before ORDER BY))
Aggregating
Expression (Before GROUP BY)
Filter (WHERE)
SettingQuotaAndLimits (Set limits and quota after reading from storage)
ReadFromMergeTree
Indexes:
MinMax
Keys:
created_date
Condition: (created_date in [19270, 19270])
Parts: 3/136
Granules: 346/8838
Partition
Keys:
toYYYYMM(created_date)
Condition: (toYYYYMM(created_date) in [202210, 202210])
Parts: 3/3
Granules: 346/346
PrimaryKey
Condition: true
Parts: 3/3
Granules: 346/346
I couldn't find anything in the documentation that Final keyword would prevent clickhouse from utilizing skip indexes. So I'm wondering what is wrong.
Clickhouse Version: 22.3.15.33
Table Engine: Replacing Merge Tree
Upvotes: 0
Views: 1006
Reputation: 13300
it's deliberately otherwise you will get an incorrect result https://github.com/ClickHouse/ClickHouse/pull/34243
there is a setting: use_skip_indexes_if_final
It is impossible to get the correct result in general if one use SKIP INDEX and FINAL. SKIP INDEX works before FINAL and filters out FINAL rows.
imagine you have a table
key | version | some_column_which_should_calculated_during_the_final
---------------------------------------------------------------------------
1 | 1 | true
1 | 2 | false
select * from .... FINAL where some_column_which_should_calculated_during_the_final=true
should return zero rows. Because the final row version=2 is = false
But if you use skip index it will return 1 | 1 | true
because SKIP index filters out version=2
, then final will evaluate 1 row instead of 2
Pipeline
so the same issue with Partition Pruning (if you use some_column_which_should_calculated_during_the_final in PARITION BY)
Upvotes: 1