Richard Rymer
Richard Rymer

Reputation: 153

is there a better way to query system tables across a clickhouse cluster?

We have a modest clickhouse cluster, ~30 nodes, and want to collect usage stats on it. We are hoping to do this using scheduled queries against the system tables, but using a normal query only get information on the one node you happen to be connected to, and creating a distributed table only works with the *log system tables. We can loop over the nodes, but don't want to do that. Is there a way to get all the instances of a system table, such as system.parts, in one query?

Upvotes: 7

Views: 3503

Answers (3)

Richard Rymer
Richard Rymer

Reputation: 153

For the record, we ended up using materialized views:

CREATE MATERIALIZED VIEW _tmp.parts on cluster main_cluster 
engine = Distributed('main_cluster', 'system', 'parts', rand())
AS select * from system.parts

Upvotes: 1

Denny Crane
Denny Crane

Reputation: 13310

Distributed tables works with any type of tables and clusterAllReplicas as well.

create table test on cluster replicated as system.processes Engine=Distributed(replicated, system, processes);


SELECT
    FQDN(),
    elapsed
FROM test

┌─FQDN()────────────────────┬────elapsed─┐
│ hos.mycmdb.net            │ 0.00063795 │
└───────────────────────────┴────────────┘


SELECT
    FQDN(),
    elapsed
FROM clusterAllReplicas(replicated, system, sessions);

SELECT elapsed
FROM clusterAllReplicas(replicated, system, processes)

┌─────elapsed─┐
│ 0.005636027 │
└─────────────┘
┌─────elapsed─┐
│ 0.000228303 │
└─────────────┘
┌─────elapsed─┐
│ 0.000275745 │
└─────────────┘
┌─────elapsed─┐
│ 0.000311621 │
└─────────────┘
┌─────elapsed─┐
│ 0.000270791 │
└─────────────┘
┌─────elapsed─┐
│ 0.000288045 │
└─────────────┘
┌─────elapsed─┐
│ 0.001048277 │
└─────────────┘
┌─────elapsed─┐
│ 0.000256203 │
└─────────────┘


Upvotes: 8

vladimir
vladimir

Reputation: 15226

It can be used remote or remoteSecure functions that support multiple addresses:

SELECT
    hostName() AS host,
    any(partition),
    count()
FROM remote('node{01..30}-west.contoso.com', system, parts)
GROUP BY host

/*
┌─host──────────┬─any(partition)─┬─count()─┐
│ node01-west   │ 202012         │     733 │
..
│ node30-west   │ 202012         │     687 │
└───────────────┴────────────────┴─────────┘
*/

Upvotes: 5

Related Questions