questionmark
questionmark

Reputation: 187

Clickhouse - "Too many partitions for single INSERT block"

During a reload of a replicated MySQL database to Clickhouse, using "clickhouse-mysql" I run into the "Too many partitions for single INSERT block" and I seem unable to progress.

So far, some of the things I've tried:

  1. setting "max_partitions_per_insert_block" to ZERO, to see if it would get through.
  2. using --mempool-max-rows to 5000, 10.000 and 20.000, so that it skips the 100.000 default loading
  3. using PARTITION BY toYYYYMMDD(created) when creating the table

The Clickhouse table is created as, which is pretty similar from how the automatic creation (--create-table) was with the exception it did miss a few of the NULLABLE types.:

CREATE TABLE DB.DB__main (
`id` Int64,
 `user_id` Int64,
 `screenname` String,
 `created` DateTime,
 `tweet_id` Int64,
 `tweet` String,
 `retweet_count` Nullable(Int32),
 `mediastatus` Nullable(String),
 `country` Nullable(String),
 `countrycode` Nullable(String),
 `city` Nullable(String),
 `latitude0` Nullable(String),
 `longitude0` Nullable(String),
 `latitude1` Nullable(String),
 `longitude1` Nullable(String),
 `latitude2` Nullable(String),
 `longitude2` Nullable(String),
 `latitude3` Nullable(String),
 `longitude3` Nullable(String),
 `datetime` DateTime,
 `datetime_update` Nullable(DateTime),
 `status` Nullable(String),
 `div0` Nullable(String),
 `div1` Nullable(String),
 `div2` Nullable(Int64),
 `datasource` Nullable(String)
) ENGINE = ReplacingMergeTree() PARTITION BY toYYYYMM(created) ORDER BY (id, user_id, screenname, created, tweet_id, datetime)

Also, why do the schema get repeated DB.DB__tablename? I got the weird situation when I first started using Clickhouse and clickhouse-mysql --create-table. It stopped when it was to start migrate the content and it took a while before I realized the table name where changed from "schema"."table-name" to "schema"."schema__table-name". After renaming the table-names the --migrate-table could run.

Upvotes: 4

Views: 10722

Answers (1)

Denny Crane
Denny Crane

Reputation: 13350

max_partitions_per_insert_block -- Limit maximum number of partitions in single INSERTed block. Zero means unlimited. Throw exception if the block contains too many partitions. This setting is a safety threshold, because using large number of partitions is a common misconception.

By default max_partitions_per_insert_block = 100

So PARTITION BY toYYYYMMDD(created) your insert will fail if your insert covers more than 100 different days. PARTITION BY toYYYYMM(created) your insert will fail if your insert covers more than 100 different months.

Nullable -- eats up to twice disk space and up to twice slower than notNullable.

schema get repeated DB.DB__tablename -- ask Altinity the creator of clickhouse-mysql -- looks like a bug.

Upvotes: 9

Related Questions