Guillaume Caruso
Guillaume Caruso

Reputation: 23

Clickhouse seems quite slow

We are considering moving to ClickHouse as a time-series database for an IoT usage, so we followed best practices for creating a table designed for that, but when running tests, the results seemed to be quite slow on first time the query is run, but then it is quite fast (due to cold caching).

Our table schema is as follows:

CREATE TABLE IF NOT EXISTS feeds (
    ts UInt64,
    uuid UUID,
    value Decimal(30, 5),

    date Date MATERIALIZED toDate(round(ts/1000)) CODEC(DoubleDelta),
    time DateTime MATERIALIZED toDateTime(round(ts/1000)) CODEC(DoubleDelta)
  ) ENGINE = ReplacingMergeTree() PARTITION BY toYYYYMM(date) PRIMARY KEY (uuid, ts) ORDER BY (uuid, ts) SETTINGS index_granularity=4096

Then to query, we execute this:

 SELECT toStartOfInterval(time, INTERVAL '1 day') AS agreg,
        avg(value) AS value
 FROM feeds
WHERE uuid='391becbb-39fd-4205-aba1-5088c9529d42'
  AND ts > 1629378660000
  AND ts < 1632057060000
GROUP BY agreg
ORDER BY agreg ASC

Which represents around a month of data, aggregated daily.

We get this result :

┌───────────────agreg─┬──────────────value─┐
│ 2021-08-19 00:00:00 │  43.54795698924731 │
│ 2021-08-20 00:00:00 │  33.68460122699386 │
│ 2021-08-23 00:00:00 │   26.8002874251497 │
│ 2021-08-24 00:00:00 │  45.30143348623853 │
│ 2021-08-25 00:00:00 │  43.78214139344263 │
│ 2021-08-26 00:00:00 │  35.66887477313974 │
│ 2021-08-27 00:00:00 │  87.03632541133456 │
│ 2021-08-28 00:00:00 │  87.00372191011236 │
│ 2021-08-29 00:00:00 │  87.16366666666666 │
│ 2021-08-30 00:00:00 │  87.62234215885947 │
│ 2021-08-31 00:00:00 │  87.62653798256538 │
│ 2021-09-01 00:00:00 │  87.08854251012146 │
│ 2021-09-02 00:00:00 │ 44.809177939646204 │
│ 2021-09-03 00:00:00 │  20.85095168374817 │
│ 2021-09-06 00:00:00 │ 21.086067551266584 │
│ 2021-09-07 00:00:00 │ 20.904265569917744 │
│ 2021-09-08 00:00:00 │ 20.988032407407406 │
│ 2021-09-09 00:00:00 │ 21.070418848167538 │
│ 2021-09-10 00:00:00 │ 20.900507674144038 │
│ 2021-09-14 00:00:00 │  34.17651296829971 │
│ 2021-09-15 00:00:00 │  33.79448634590377 │
│ 2021-09-16 00:00:00 │   33.7209536423841 │
│ 2021-09-17 00:00:00 │   33.5361985472155 │
└─────────────────────┴────────────────────┘

Ok. 23 rows in set. Elapsed: 6.891 sec. Processed: 0 rows, 0.0B (0 rows/s, 0.0B/s)

7s seems quite slow for such a simple operation.

Is our way of doing this wrong ?

Upvotes: 1

Views: 3879

Answers (1)

Denny Crane
Denny Crane

Reputation: 13310

You don't use partitioning and primary index.

  1. You have 3 columns with the same temporal data. When you query ts and time your query processes twice more data. Get rid of time and date columns!!!!!!!!!!!!!!!

  2. Your table is partitioned by the column date and your where section uses AND ts > 1629378660000 === partition elimination does not work!!! Use partition by toYYYYMM(toDateTime(intDiv(ts,1000)) !

  3. Do not use round(ts/1000), use intDiv(ts,1000)

  4. You don't use primary key index. Query Optimizer is stupid, use GROUP BY uuid, agreg ORDER BY uuid, agreg ASC +try set optimize_aggregation_in_order=1 Make agreg a derivative from ts.

(uuid, agreg (ts) ) pair matches table's ORDER BY

  1. Yes, CH still will be slower than other timeseries databases, because CH is not designed for such queries.

Upvotes: 1

Related Questions