Reputation: 9216
I have two huge tables, of 2bln rows:
CREATE TABLE transactions (
`id` UInt64,
`from_address` String,
`date` UInt64,
)
ENGINE = MergeTree
PRIMARY KEY from_address
ORDER BY from_address
SETTINGS index_granularity = 8192
and 380 mln rows
CREATE TABLE addresses_tmp (
`id` UInt64,
`address` String
)
ENGINE = MergeTree
PRIMARY KEY address
ORDER BY address
SETTINGS index_granularity = 8192
I need to perform a simple join on from_address and address fields, which are primary key fields. Clickhouse is able to select from these tables, ordered by these columns quite quickly, in less then a minute, which is acceptible for me for such a big JOIN. Both of these two queries consume below 100M memory at their peak, according to clickhouse log:
SELECT sum(id) FROM (SELECT * FROM addresses ORDER BY address)
SELECT sum(id) FROM (SELECT * FROM transactions ORDER BY from_address)
Now I try to perform join itself:
CREATE TABLE transactions_tmp
ENGINE = MergeTree
PRIMARY KEY id
AS SELECT
t.id AS to_id,
tx.date,
tx.id
FROM transactions AS tx
INNER JOIN addresses AS t ON t.address = tx.to_address
It may seem like it is trivial to join these tables just by iterating them simultaneously, indeed clickhouse always fails with error like this (I tried default merge algorithm, full_sorting_merge and grace_hash):
Memory limit (total) exceeded: would use 28.05 GiB (attempt to allocate chunk of 4342112 bytes), maximum: 27.86 GiB. OvercommitTracker decision: Query was selected to stop by OvercommitTracker.
Or, if I use SETTINGS join_algorithm = 'partial_merge' it doesn't fail, but gets stuck at around 15%, join speed degrades to around 30K rows per second, so it will take few days to complete. So how do I force Clickhouse to perfom a simple merge join of sorted data?
Upvotes: 0
Views: 292
Reputation: 346
since your JOIN() would require too much memory using the default "hash" algorithm, you'll need to please use one of the alternate join algorithms that limit memory usage. I'd suggest trying the grace_hash algorithm:
https://clickhouse.com/docs/en/operations/settings/settings#join_algorithm
SETTINGS join_algorithm = 'grace_hash'
Upvotes: 0