Reputation: 1110
I'm playing with data in csv format from https://dev.maxmind.com/geoip/geoip2/geolite2/
.
Generally, it's data that map from ip block to asn and country.
I have 2 table both are Memory
engine, first has 299727 records, second has 406685.
SELECT *
FROM __ip_block_to_country
LIMIT 5
┌─network────┬───────id─┬───min_ip─┬───max_ip─┬─geoname_id─┬─country_iso_code─┬─country_name─┐
│ 1.0.0.0/24 │ 16777216 │ 16777217 │ 16777472 │ 2077456 │ AU │ Australia │
│ 1.0.1.0/24 │ 16777472 │ 16777473 │ 16777728 │ 1814991 │ CN │ China │
│ 1.0.2.0/23 │ 16777728 │ 16777729 │ 16778240 │ 1814991 │ CN │ China │
│ 1.0.4.0/22 │ 16778240 │ 16778241 │ 16779264 │ 2077456 │ AU │ Australia │
│ 1.0.8.0/21 │ 16779264 │ 16779265 │ 16781312 │ 1814991 │ CN │ China │
└────────────┴──────────┴──────────┴──────────┴────────────┴──────────────────┴──────────────┘
SELECT *
FROM __ip_block_to_asn
LIMIT 5
┌─network──────┬─autonomous_system_number─┬─autonomous_system_organization─┬───────id─┬─subnet_count─┬───min_ip─┬───max_ip─┐
│ 1.0.0.0/24 │ 13335 │ Cloudflare Inc │ 16777216 │ 255 │ 16777217 │ 16777472 │
│ 1.0.4.0/22 │ 56203 │ Gtelecom-AUSTRALIA │ 16778240 │ 1023 │ 16778241 │ 16779264 │
│ 1.0.16.0/24 │ 2519 │ ARTERIA Networks Corporation │ 16781312 │ 255 │ 16781313 │ 16781568 │
│ 1.0.64.0/18 │ 18144 │ Energia Communications,Inc. │ 16793600 │ 16383 │ 16793601 │ 16809984 │
│ 1.0.128.0/17 │ 23969 │ TOT Public Company Limited │ 16809984 │ 32767 │ 16809985 │ 16842752 │
└──────────────┴──────────────────────────┴────────────────────────────────┴──────────┴──────────────┴──────────┴──────────┘
Now, i want to exam which country that covers entire ip pool of one asn. The below query is just to obtain index of statisfied country.
SELECT idx from(
SELECT
(
SELECT groupArray(min_ip),groupArray(max_ip),groupArray(country_iso_code),groupArray(country_name)
FROM __ip_block_to_country
) t,
arrayFilter((i,mii, mai) -> min_ip >= mii and max_ip <= mai, arrayEnumerate(t.1), t.1, t.2) as idx
FROM __ip_block_to_asn
);
I got following exception:
Received exception from server (version 1.1.54394):
Code: 241. DB::Exception: Received from localhost:9000, ::1. DB::Exception: Memory limit (for query) exceeded: would use 512.02 GiB (attempt to allocate chunk of 549755813888 bytes), maximum: 37.25 GiB.
My question is:
SELECT groupArray(min_ip),groupArray(max_ip),groupArray(country_iso_code),groupArray(country_name)
is executed along with every record of __ip_block_to_asn
, then query needs so much memory. Is that true to my query ?Upvotes: 2
Views: 983
Reputation: 911
Scalar subquery is executed only once.
But to execute arrayFilter
, arrays are multiplied by number of rows of processed blocks from __ip_block_to_asn
table. It is something like cross join of two tables.
To overcome this, you can use smaller block size for SELECT from __ip_block_to_asn
.
It is controlled by max_block_size
setting. But for Memory
tables, blocks always have the same size as when they was inserted into a table, regardless to max_block_size
setting during SELECT. To allow flexible block size, you can reload this table to TinyLog
engine.
CREATE TABLE __ip_block_to_asn2 ENGINE = TinyLog AS SELECT * FROM __ip_block_to_asn
Then execute:
SET max_block_size = 10;
SELECT idx from(
SELECT
(
SELECT groupArray(min_ip),groupArray(max_ip),groupArray(country_iso_code),groupArray(country_name)
FROM __ip_block_to_country
) t,
arrayFilter((i,mii, mai) -> min_ip >= mii and max_ip <= mai, arrayEnumerate(t.1), t.1, t.2) as idx
FROM __ip_block_to_asn2
);
Upvotes: 3