radonthetyrant
radonthetyrant

Reputation: 1415

Modifying primary_key/sort_key for an existing table that has materialized views

I Have a table stats_view_source that is described as follows:

CREATE TABLE stats_view_source
(
    `timestamp` DateTime,
    `user_ip` String,
    `user_country` Nullable(String),
    `user_language` Nullable(String),
    `user_agent` Nullable(String),
    `entity_type` String,
    `entity_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/.../{uuid}/{shard}', '{replica}')
PRIMARY KEY (entity_type, entity_id)
ORDER BY (entity_type, entity_id, timestamp)
SETTINGS index_granularity = 8192

It has multiple materialized views attached to it that aggregate daily, monthly and yearly stats into different tables.

Too late did i discover that i need the user_ip to be part of the sort_key so that multiple rows to the same sort_key in the same second do not get deduplicated. I attempted to rectify this by modifying the sort_key to include (entity_type, entity_id, user_ip, timestamp), however after investigating errors for this query i am now aware that this only works with new colums and for adding existing columns to the sort_key, i have to recreate the table. (https://github.com/ClickHouse/ClickHouse/issues/13835)

My concern is that the tables already have a few hundred-thousand rows, aggregated through multiple materialized views, that i cannot start from a clean slate. As i am forced to recreate a new table, i am looking for the best way to accomplish this.

CREATE TABLE IF NOT EXISTS stats_view_source_tmp
            (
                `timestamp`   DateTime,
                `user_ip`     String,
                `user_country` Nullable(String),
                `user_language` Nullable(String),
                `user_agent` Nullable(String),
                `entity_type` String,
                `entity_id`   String
                )
                ENGINE = ReplicatedMergeTree()
                PRIMARY KEY (entity_type, entity_id, user_ip)
                ORDER BY (entity_type, entity_id, user_ip, timestamp)
                SETTINGS index_granularity = 8192;
INSERT INTO stats_view_source_tmp SELECT * FROM stats_view_source;
EXCHANGE TABLE stats_view_source_tmp AND stats_view_source

What are my options? Is my suggested path correct or are there better ways to acomplish this?

Upvotes: 0

Views: 56

Answers (1)

Jianfei Hu
Jianfei Hu

Reputation: 494

Your approach should work. According to the documentation, materialized view behaves like a insert triggers. So EXCHANGE TABLES would not update the existing materialized view.

Materialized views in ClickHouse are implemented more like insert triggers. If there’s some aggregation in the view query, it’s applied only to the batch of freshly inserted data. Any changes to existing data of source table (like update, delete, drop partition, etc.) does not change the materialized view.

I quickly tried and it works as expected:

Create table without sorting key

clickhouse-cloud :) CREATE MATERIALIZED VIEW sales_summary                                                                                                                                   
ENGINE = SummingMergeTree()                                                                                                                                                                  
ORDER BY (date, country)                                                                                                                                                                     
AS SELECT                                                                                                                                                                                    
    date,                                                                                                                                                                                    
    product,                                                                                                                                                                                 
    country,                                                                                                                                                                                 
    sum(quantity) AS total_quantity,                                                                                                                                                         
    sum(price * quantity) AS total_revenue                                                                                                                                                   
FROM sales                                                                                                                                                                                   
GROUP BY date, product, country;                                                                                                                                                             
                                                                                                                                                                                             
CREATE MATERIALIZED VIEW sales_summary                                                                                                                                                       
ENGINE = SummingMergeTree                                                                                                                                                                    
ORDER BY (date, country) AS                                                                                                                                                                  
SELECT                                                                                                                                                                                       
    date,                                                                                                                                                                                    
    product,                                                                                                                                                                                 
    country,                                                                                                                                                                                 
    sum(quantity) AS total_quantity,                                                                                                                                                         
    sum(price * quantity) AS total_revenue                                                                                                                                                   
FROM sales                                                                                                                                                                                   
GROUP BY                                                                                                                                                                                     
    date,                                                                                                                                                                                    
    product,                                                                                                                                                                                 
    country                                                                                                                                                                                  
                                                                                                                                                                                             
Query id: 705225b5-ccce-4e3b-abde-97b67568bb68                                                                                                                                               
                                                                                                                                                                                             
Ok.                                                                                                                                                                                          
                                                                                                                                                                                             
0 rows in set. Elapsed: 0.221 sec.                                                                                                                                                           
                                                                                                                                                                                             
clickhouse-cloud :) INSERT INTO sales (date, product, quantity, price, country) VALUES                                                                                                       
('2024-05-01', 'Product A', 10, 15.5, 'USA'),                                                                                                                                                
('2024-05-01', 'Product B', 5, 20.0, 'USA'),                                                                                                                                                 
('2024-05-02', 'Product A', 7, 15.5, 'Canada'),                                                                                                                                              
('2024-05-02', 'Product B', 3, 20.0, 'Canada'),                                               
('2024-05-03', 'Product A', 12, 15.5, 'USA'),                                                 
('2024-05-03', 'Product B', 8, 20.0, 'USA');   

Create temporary table with additional sorting key

CREATE TABLE sales_by_country (                                                                                                                                          
    date Date,                                                                                                                                                                               
    product String,                                                                                                                                                                          
    quantity UInt32,                                                                                                                                                                         
    price Float32,                                                                                                                                                                           
    country String                                                                                                                                                                           
) ENGINE = MergeTree()                                                                                                                                                                       
ORDER BY (country, date); 


INSERT INTO sales_by_country (date, product, quantity, price, country) SELECT
    date,
    product,
    quantity,
    price,
    country
FROM sales

EXCHANGE TABLES sales_by_country AND sales

At this point, checking sales_summary it hasn't changed.

INSERT into exchanged table validate materilized view

clickhouse-cloud :) INSERT INTO sales (date, product, quantity, price, country) VALUES
('2024-05-06', 'Product A', 10, 15.5, 'Mexico')

INSERT INTO sales (date, product, quantity, price, country) FORMAT Values

Query id: 3eee4c6b-42e9-47b4-9741-fd09cc589aed

Ok.

1 row in set. Elapsed: 0.411 sec. 

clickhouse-cloud :) select * from sales_summary

SELECT *
FROM sales_summary

Query id: 9dd4f6ec-2d74-4826-b88f-c192f740e1c5

┌───────date─┬─product───┬─country─┬─total_quantity─┬─total_revenue─┐
│ 2024-05-01 │ Product B │ USA     │             15 │           255 │
│ 2024-05-02 │ Product A │ Canada  │             10 │         168.5 │
│ 2024-05-03 │ Product A │ USA     │             20 │           346 │
└────────────┴───────────┴─────────┴────────────────┴───────────────┘
┌───────date─┬─product───┬─country─┬─total_quantity─┬─total_revenue─┐
│ 2024-05-06 │ Product A │ Mexico  │             10 │           155 │
└────────────┴───────────┴─────────┴────────────────┴───────────────┘

4 rows in set. Elapsed: 0.007 sec. 

Upvotes: 1

Related Questions