Thang Nguyen
Thang Nguyen

Reputation: 1110

memory used and execution order in sub-query

I'm playing with data in csv format from https://dev.maxmind.com/geoip/geoip2/geolite2/. Generally, it's data that map from ip block to asn and country.

I have 2 table both are Memory engine, first has 299727 records, second has 406685.

SELECT *
FROM __ip_block_to_country 
LIMIT 5

┌─network────┬───────id─┬───min_ip─┬───max_ip─┬─geoname_id─┬─country_iso_code─┬─country_name─┐
│ 1.0.0.0/24 │ 16777216 │ 16777217 │ 16777472 │    2077456 │ AU               │ Australia    │
│ 1.0.1.0/24 │ 16777472 │ 16777473 │ 16777728 │    1814991 │ CN               │ China        │
│ 1.0.2.0/23 │ 16777728 │ 16777729 │ 16778240 │    1814991 │ CN               │ China        │
│ 1.0.4.0/22 │ 16778240 │ 16778241 │ 16779264 │    2077456 │ AU               │ Australia    │
│ 1.0.8.0/21 │ 16779264 │ 16779265 │ 16781312 │    1814991 │ CN               │ China        │
└────────────┴──────────┴──────────┴──────────┴────────────┴──────────────────┴──────────────┘

SELECT *
FROM __ip_block_to_asn 
LIMIT 5

┌─network──────┬─autonomous_system_number─┬─autonomous_system_organization─┬───────id─┬─subnet_count─┬───min_ip─┬───max_ip─┐
│ 1.0.0.0/24   │                    13335 │ Cloudflare Inc                 │ 16777216 │          255 │ 16777217 │ 16777472 │
│ 1.0.4.0/22   │                    56203 │ Gtelecom-AUSTRALIA             │ 16778240 │         1023 │ 16778241 │ 16779264 │
│ 1.0.16.0/24  │                     2519 │ ARTERIA Networks Corporation   │ 16781312 │          255 │ 16781313 │ 16781568 │
│ 1.0.64.0/18  │                    18144 │ Energia Communications,Inc.    │ 16793600 │        16383 │ 16793601 │ 16809984 │
│ 1.0.128.0/17 │                    23969 │ TOT Public Company Limited     │ 16809984 │        32767 │ 16809985 │ 16842752 │
└──────────────┴──────────────────────────┴────────────────────────────────┴──────────┴──────────────┴──────────┴──────────┘

Now, i want to exam which country that covers entire ip pool of one asn. The below query is just to obtain index of statisfied country.

SELECT idx from(
SELECT 
    (
        SELECT groupArray(min_ip),groupArray(max_ip),groupArray(country_iso_code),groupArray(country_name)
        FROM __ip_block_to_country
    ) t,
    arrayFilter((i,mii, mai) -> min_ip >= mii and max_ip <= mai, arrayEnumerate(t.1), t.1, t.2) as idx
FROM __ip_block_to_asn
);

I got following exception: Received exception from server (version 1.1.54394): Code: 241. DB::Exception: Received from localhost:9000, ::1. DB::Exception: Memory limit (for query) exceeded: would use 512.02 GiB (attempt to allocate chunk of 549755813888 bytes), maximum: 37.25 GiB.

My question is:

Upvotes: 2

Views: 983

Answers (1)

uYSIZfoz
uYSIZfoz

Reputation: 911

Scalar subquery is executed only once.

But to execute arrayFilter, arrays are multiplied by number of rows of processed blocks from __ip_block_to_asn table. It is something like cross join of two tables.

To overcome this, you can use smaller block size for SELECT from __ip_block_to_asn. It is controlled by max_block_size setting. But for Memory tables, blocks always have the same size as when they was inserted into a table, regardless to max_block_size setting during SELECT. To allow flexible block size, you can reload this table to TinyLog engine.

CREATE TABLE __ip_block_to_asn2 ENGINE = TinyLog AS SELECT * FROM __ip_block_to_asn

Then execute:

SET max_block_size = 10;

SELECT idx from(
SELECT 
(
    SELECT groupArray(min_ip),groupArray(max_ip),groupArray(country_iso_code),groupArray(country_name)
    FROM __ip_block_to_country
) t,
arrayFilter((i,mii, mai) -> min_ip >= mii and max_ip <= mai, arrayEnumerate(t.1), t.1, t.2) as idx
FROM __ip_block_to_asn2
);

Upvotes: 3

Related Questions