Reputation: 77
I have hundreds of thousands of IP's and want to identify which ones are within a certain range. Ranges:
64.233.160.0 / 8192
66.102.0.0 / 4096
66.249.64.0 / 8192
72.14.192.0 / 16384
74.125.0.0 / 65536
209.85.128.0 / 32768
216.239.32.0 / 8192
So I converted these ranges to the following:
64.233.160.0 - 64.233.192.0
66.102.0.0 - 66.102.16.0
66.249.64.0 - 66.249.96.0
72.14.192.0 - 72.15.0.0
74.125.0.0 - 74.126.0.0
209.85.128.0 - 209.86.0.0
216.239.32.0 - 216.239.64.0
So now I want to query if an IP address is within any of these ranges. SQL isn't going to understand the octets so I don't know what to do.
Could use some Hex2Dec/Dec2Hex conversions?
I figure this should be something that has been done before, I'm sure I'm not the first person to try and identify particular ip's in a list using an ip range.
I will be doing look ups on multiple IP addresses so some might be 20.0.1.123 and another might be 124.123.123.1 ie the format of the octets won't be the same
Upvotes: 5
Views: 2438
Reputation: 11
As part of our GDPR processing, we transformed the IP range dataset into 3 key-value dataset per octet (without the 4th), and it worked well on over 10MM records through SQL in multiple warehouses.
Basically, instead of doing:
FROM DATA
JOIN IP_RANGE ON DATA.IP_NUM BETWEEN IP_RANGE.IP_NUM_FROM AND IP_RANGE.IP_NUM_TO
we tested:
FROM DATA
LEFT JOIN IP_OCTET1 ON DATA.IP_OCTET1 = IP_OCTET1.IP_OCTET1
LEFT JOIN IP_OCTET2 ON DATA.IP_OCTET1 = IP_OCTET2.IP_OCTET1 AND DATA.IP_OCTET2 = IP_OCTET2.IP_OCTET2
LEFT JOIN IP_OCTET3 ON DATA.IP_OCTET1 = IP_OCTET3.IP_OCTET1 AND DATA.IP_OCTET2 = IP_OCTET3.IP_OCTET2 AND DATA.IP_OCTET3 = IP_OCTET3.IP_OCTET3
you can read more about it in our latest blog: https://cloudinary.com/blog/identifying_countries_by_ip_address_in_columnar_databases_through_sql
Disclaimer: I'm the author of the article.
Upvotes: 1
Reputation: 172994
For BigQuery Standard SQL the way to Convert IP into integer is as below
#standardSQL
SELECT NET.IPV4_TO_INT64(NET.IP_FROM_STRING('64.233.160.0'))
Along with SQL UDF the usage can be simplified down to something like below
#standardSQL
CREATE TEMP FUNCTION ip2int(ip STRING) AS (
NET.IPV4_TO_INT64(NET.IP_FROM_STRING(ip))
);
WITH Ranges AS (
SELECT '64.233.160.0' AS IP1, '64.233.192.0' AS IP2 UNION ALL
SELECT '66.102.0.0', '66.102.16.0' UNION ALL
SELECT '66.249.64.0', '66.249.96.0' UNION ALL
SELECT '72.14.192.0', '72.15.0.0' UNION ALL
SELECT '74.125.0.0', '74.126.0.0' UNION ALL
SELECT '209.85.128.0', '209.86.0.0' UNION ALL
SELECT '216.239.32.0', '216.239.64.0'
),
IPs AS (
SELECT '64.233.160.2' AS IP UNION ALL
SELECT '72.14.192.101'
)
SELECT *
FROM IPs AS i
JOIN Ranges AS r
ON ip2int(IP) BETWEEN ip2int(IP1) AND ip2int(IP2)
with Output as
IP IP1 IP2
72.14.192.101 72.14.192.0 72.15.0.0
64.233.160.2 64.233.160.0 64.233.192.0
More info here about NET functions
and SQL UDF
Upvotes: 1
Reputation: 981
Pentium10's answer is correct for the Legacy Bigquery syntax. For anyone who happens to be using StandardSQL Bigquery Syntax you are looking for this:
NET.IP_FROM_STRING('64.233.160.0') returns 1089052672
Upvotes: 0
Reputation: 207863
You can us the IP functions for this:
PARSE_IP('64.233.160.0') returns 1089052672
and then you can wrap them in a BETWEEN state.
Upvotes: 3
Reputation: 91912
IP numbers are really just integers. What you have done here is that you have saved them as human-readable strings. You will need to convert them back to their original integer representation so that you can state a query with normal BETWEEN.
Upvotes: 3