Hoptical
Hoptical

Reputation: 3

Clickhouse doesn't use skip index with Final modifer in query

I have Added a skip index on a field (status_id) in clickhouse. It is used and working when I use my query as below:

EXPLAIN INDEXES=1
SELECT COUNT(*) FROM mytable WHERE 
created_date='2022-10-05' and status_id=123;

The result:

Expression ((Projection + Before ORDER BY))
  Aggregating
    Expression (Before GROUP BY)
      Filter (WHERE)
        SettingQuotaAndLimits (Set limits and quota after reading from storage)
          ReadFromMergeTree
          Indexes:
            MinMax
              Keys: 
                created_date
              Condition: (created_date in [19270, 19270])
              Parts: 3/138
              Granules: 346/8840
            Partition
              Keys: 
                toYYYYMM(created_date)
              Condition: (toYYYYMM(created_date) in [202210, 202210])
              Parts: 3/3
              Granules: 346/346
            PrimaryKey
              Condition: true
              Parts: 3/3
              Granules: 346/346
            Skip
              Name: status_index
              Description: set GRANULARITY 1
              Parts: 3/3
              Granules: 14/346

But when I use the final modifier, the skip index is not used anymore:

EXPLAIN INDEXES=1
SELECT COUNT(*) FROM mytable FINAL WHERE 
created_date='2022-10-05' and status_id=123;

The result:

Expression ((Projection + Before ORDER BY))
  Aggregating
    Expression (Before GROUP BY)
      Filter (WHERE)
        SettingQuotaAndLimits (Set limits and quota after reading from storage)
          ReadFromMergeTree
          Indexes:
            MinMax
              Keys: 
                created_date
              Condition: (created_date in [19270, 19270])
              Parts: 3/136
              Granules: 346/8838
            Partition
              Keys: 
                toYYYYMM(created_date)
              Condition: (toYYYYMM(created_date) in [202210, 202210])
              Parts: 3/3
              Granules: 346/346
            PrimaryKey
              Condition: true
              Parts: 3/3
              Granules: 346/346

I couldn't find anything in the documentation that Final keyword would prevent clickhouse from utilizing skip indexes. So I'm wondering what is wrong.

Clickhouse Version: 22.3.15.33

Table Engine: Replacing Merge Tree

Upvotes: 0

Views: 1006

Answers (1)

Denny Crane
Denny Crane

Reputation: 13300

it's deliberately otherwise you will get an incorrect result https://github.com/ClickHouse/ClickHouse/pull/34243

there is a setting: use_skip_indexes_if_final

It is impossible to get the correct result in general if one use SKIP INDEX and FINAL. SKIP INDEX works before FINAL and filters out FINAL rows.

imagine you have a table

key   | version      | some_column_which_should_calculated_during_the_final 
---------------------------------------------------------------------------
1     |     1        | true
1     |     2        | false

select * from .... FINAL where some_column_which_should_calculated_during_the_final=true should return zero rows. Because the final row version=2 is = false

But if you use skip index it will return 1 | 1 | true because SKIP index filters out version=2, then final will evaluate 1 row instead of 2

Pipeline

  1. Partition Pruning
  2. Primary index
  3. Skip index / PREWHERE
  4. FINAL
  5. WHERE

so the same issue with Partition Pruning (if you use some_column_which_should_calculated_during_the_final in PARITION BY)

Upvotes: 1

Related Questions