Reputation: 75
I have several questions about custom partitioning in clickhouse. Background: i am trying to build a TSDB on top of clickhouse. We need to support very large batch write and complicated OLAP read.
Let's assume we use the standard partition by month , and we have 20 nodes in our clickhouse cluster. I am wondering will the data from same month all flow to the same node or will clickhouse do some internal balance and put the data from same month to several nodes?
If all the data from same month write to the same node, then it will be very bad for our scenario. I will probably consider patition by (timestamp, tags)where tags are the different tags that define the data source. Our monitoring system will write data to TSDB every 30 seconds. Our read pattern is usually single table range scan or several tables join on a column. Any advice on how should i customize my partition strategy?
Since clickhouse does not support secondary index, and we will run selection query on columns, i think i should put those important columns into the primary key, so my primary key will probably be like (timestamp, ip, port...), any advice on this design or make give a good reason why clickhouse does not support secondary index like bitmap index on other non-primary column?
Upvotes: 1
Views: 2031
Reputation: 2554
Upvotes: 1