Phoes Huang
Phoes Huang

Reputation: 21

ClickHouse: Materialized view is not be timely optimized to merge the partitions

I created a table and two materialized views recursively.

Table:

CREATE TABLE `log_details` (
  date String, 
  event_time DateTime, 
  username String, 
  city String) 
ENGINE = MergeTree() 
ORDER BY (date, event_time) 
PARTITION BY date TTL event_time + INTERVAL 1 MONTH

Materialized views:

CREATE MATERIALIZED VIEW `log_u_c_day_mv` 
ENGINE = SummingMergeTree() 
PARTITION BY date 
ORDER BY (date, username, city) 
AS 
SELECT date, username, city, count() as times 
FROM `log_details` 
GROUP BY date, username, city
CREATE MATERIALIZED VIEW `log_u_day_mv` 
ENGINE = SummingMergeTree() 
PARTITION BY date 
ORDER BY (date, username) 
AS 
SELECT date, username, SUM(times) as total_times 
FROM `.inner.log_u_c_day_mv` 
GROUP BY date, username

Insert into log_details → Insert into log_u_c_day_mv → Insert into log_u_day_mv.

log_u_day_mv is not be optimized after 15 minutes inserting log_u_c_day_mv even over one day.

I tried to optimize log_u_day_mv manually and it works.

OPTIMIZE TABLE `.inner.log_u_day_mv` PARTITION 20210110

But ClickHouse does not timely optimize it. How to solve it?

Upvotes: 2

Views: 1551

Answers (1)

Denny Crane
Denny Crane

Reputation: 13310

Data always is not fully aggregated/collapsed in MT.

If you do optimize final the next insert into creates a new part.

CH does not merge parts by time. Merge scheduler selects parts by own algorithm based on the current node workload / number of parts / size of parts.

SummingMT MUST BE QUERIED with sum / groupby ALWAYS.

select sum(times), username
from log_u_day_mv
group by username

DO NOT USE from log_u_day_mv FINAL it reads excessive columns!!!!!!!!!!!!!!

Upvotes: 1

Related Questions