Reputation: 21
I created a table and two materialized views recursively.
Table:
CREATE TABLE `log_details` (
date String,
event_time DateTime,
username String,
city String)
ENGINE = MergeTree()
ORDER BY (date, event_time)
PARTITION BY date TTL event_time + INTERVAL 1 MONTH
Materialized views:
CREATE MATERIALIZED VIEW `log_u_c_day_mv`
ENGINE = SummingMergeTree()
PARTITION BY date
ORDER BY (date, username, city)
AS
SELECT date, username, city, count() as times
FROM `log_details`
GROUP BY date, username, city
CREATE MATERIALIZED VIEW `log_u_day_mv`
ENGINE = SummingMergeTree()
PARTITION BY date
ORDER BY (date, username)
AS
SELECT date, username, SUM(times) as total_times
FROM `.inner.log_u_c_day_mv`
GROUP BY date, username
Insert into log_details → Insert into log_u_c_day_mv → Insert into log_u_day_mv.
log_u_day_mv
is not be optimized after 15 minutes inserting log_u_c_day_mv
even over one day.
I tried to optimize log_u_day_mv
manually and it works.
OPTIMIZE TABLE `.inner.log_u_day_mv` PARTITION 20210110
But ClickHouse does not timely optimize it. How to solve it?
Upvotes: 2
Views: 1551
Reputation: 13310
Data always is not fully aggregated/collapsed in MT.
If you do optimize final
the next insert into creates a new part.
CH does not merge parts by time. Merge scheduler selects parts by own algorithm based on the current node workload / number of parts / size of parts.
SummingMT MUST BE QUERIED with sum / groupby ALWAYS.
select sum(times), username
from log_u_day_mv
group by username
DO NOT USE from log_u_day_mv FINAL
it reads excessive columns!!!!!!!!!!!!!!
Upvotes: 1