Reputation: 1415
I Have a table stats_view_source
that is described as follows:
CREATE TABLE stats_view_source
(
`timestamp` DateTime,
`user_ip` String,
`user_country` Nullable(String),
`user_language` Nullable(String),
`user_agent` Nullable(String),
`entity_type` String,
`entity_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/.../{uuid}/{shard}', '{replica}')
PRIMARY KEY (entity_type, entity_id)
ORDER BY (entity_type, entity_id, timestamp)
SETTINGS index_granularity = 8192
It has multiple materialized views attached to it that aggregate daily, monthly and yearly stats into different tables.
Too late did i discover that i need the user_ip to be part of the sort_key so that multiple rows to the same sort_key in the same second do not get deduplicated. I attempted to rectify this by modifying the sort_key to include (entity_type, entity_id, user_ip, timestamp), however after investigating errors for this query i am now aware that this only works with new colums and for adding existing columns to the sort_key, i have to recreate the table. (https://github.com/ClickHouse/ClickHouse/issues/13835)
My concern is that the tables already have a few hundred-thousand rows, aggregated through multiple materialized views, that i cannot start from a clean slate. As i am forced to recreate a new table, i am looking for the best way to accomplish this.
CREATE TABLE IF NOT EXISTS stats_view_source_tmp
(
`timestamp` DateTime,
`user_ip` String,
`user_country` Nullable(String),
`user_language` Nullable(String),
`user_agent` Nullable(String),
`entity_type` String,
`entity_id` String
)
ENGINE = ReplicatedMergeTree()
PRIMARY KEY (entity_type, entity_id, user_ip)
ORDER BY (entity_type, entity_id, user_ip, timestamp)
SETTINGS index_granularity = 8192;
INSERT INTO stats_view_source_tmp SELECT * FROM stats_view_source;
EXCHANGE TABLE stats_view_source_tmp AND stats_view_source
What are my options? Is my suggested path correct or are there better ways to acomplish this?
Upvotes: 0
Views: 56
Reputation: 494
Your approach should work. According to the documentation, materialized view behaves like a insert triggers. So EXCHANGE TABLES
would not update the existing materialized view.
Materialized views in ClickHouse are implemented more like insert triggers. If there’s some aggregation in the view query, it’s applied only to the batch of freshly inserted data. Any changes to existing data of source table (like update, delete, drop partition, etc.) does not change the materialized view.
I quickly tried and it works as expected:
clickhouse-cloud :) CREATE MATERIALIZED VIEW sales_summary
ENGINE = SummingMergeTree()
ORDER BY (date, country)
AS SELECT
date,
product,
country,
sum(quantity) AS total_quantity,
sum(price * quantity) AS total_revenue
FROM sales
GROUP BY date, product, country;
CREATE MATERIALIZED VIEW sales_summary
ENGINE = SummingMergeTree
ORDER BY (date, country) AS
SELECT
date,
product,
country,
sum(quantity) AS total_quantity,
sum(price * quantity) AS total_revenue
FROM sales
GROUP BY
date,
product,
country
Query id: 705225b5-ccce-4e3b-abde-97b67568bb68
Ok.
0 rows in set. Elapsed: 0.221 sec.
clickhouse-cloud :) INSERT INTO sales (date, product, quantity, price, country) VALUES
('2024-05-01', 'Product A', 10, 15.5, 'USA'),
('2024-05-01', 'Product B', 5, 20.0, 'USA'),
('2024-05-02', 'Product A', 7, 15.5, 'Canada'),
('2024-05-02', 'Product B', 3, 20.0, 'Canada'),
('2024-05-03', 'Product A', 12, 15.5, 'USA'),
('2024-05-03', 'Product B', 8, 20.0, 'USA');
CREATE TABLE sales_by_country (
date Date,
product String,
quantity UInt32,
price Float32,
country String
) ENGINE = MergeTree()
ORDER BY (country, date);
INSERT INTO sales_by_country (date, product, quantity, price, country) SELECT
date,
product,
quantity,
price,
country
FROM sales
EXCHANGE TABLES sales_by_country AND sales
At this point, checking sales_summary it hasn't changed.
clickhouse-cloud :) INSERT INTO sales (date, product, quantity, price, country) VALUES
('2024-05-06', 'Product A', 10, 15.5, 'Mexico')
INSERT INTO sales (date, product, quantity, price, country) FORMAT Values
Query id: 3eee4c6b-42e9-47b4-9741-fd09cc589aed
Ok.
1 row in set. Elapsed: 0.411 sec.
clickhouse-cloud :) select * from sales_summary
SELECT *
FROM sales_summary
Query id: 9dd4f6ec-2d74-4826-b88f-c192f740e1c5
┌───────date─┬─product───┬─country─┬─total_quantity─┬─total_revenue─┐
│ 2024-05-01 │ Product B │ USA │ 15 │ 255 │
│ 2024-05-02 │ Product A │ Canada │ 10 │ 168.5 │
│ 2024-05-03 │ Product A │ USA │ 20 │ 346 │
└────────────┴───────────┴─────────┴────────────────┴───────────────┘
┌───────date─┬─product───┬─country─┬─total_quantity─┬─total_revenue─┐
│ 2024-05-06 │ Product A │ Mexico │ 10 │ 155 │
└────────────┴───────────┴─────────┴────────────────┴───────────────┘
4 rows in set. Elapsed: 0.007 sec.
Upvotes: 1