Reputation: 197
I have a PHP/MySQL geo-ip script which takes a user's IP address, converts it to long integer and searches through an IP range table for a single geographical location id of where the user's IP is located in:
$iplong = ip2long($_SERVER['REMOTE_ADDR']);
SELECT id FROM geoip
WHERE ".$iplong." BETWEEN range_begin AND range_end
ORDER BY range_begin DESC LIMIT 1
The "geoip" table contains 2.5M rows. Both the "range_begin" and "range_end" column are Unique Indexes. IP ranges don't seem to overlap. Sometimes this query takes about 1 second to complete, but I was hoping there was a way to improve it as it is the slowest query on my site.
Thanks
EDIT: I changed my query to:
SELECT * FROM geoip
WHERE range_begin <= ".$iplong." AND range_end >= ".$iplong."
ORDER BY range_begin DESC LIMIT 1
I now have a UNIQUE Composite Index (range_begin, range_end). I used the "EXPLAIN" function and it looks like it still searches through 1.2M rows:
id: 1
select_type: Simple
table: geoip
type: range
possible_keys: range_begin
key: range_begin
key_len: 8
ref: NULL
rows: 1282026
Extra: Using Index Condition
Upvotes: 3
Views: 2711
Reputation: 11
There as even simpler way that escaped me until I looked at the data.
First, run
SELECT * FROM Ip2location WHERE ip_from <= $IPAddress ORDER BY ip_from DESC LIMIT 1
This will return the closest IP that is equal to or less than the value you are searching for since you are returning the highest single value in the DB.
Next, just look to make sure the ip_to is greater than or equal to the IP when you return the entire row and you'll be assured the IP falls in that range.
If the IP does not fall in that range (if the ip_to is less), that means there is no record for this IP.
Simple and executes quickly!
Upvotes: 1
Reputation: 31
I was dealing with a similar issue, where I had to search a database with about 4 Million IP ranges and found a nice solution that brought the number of scanned rows down from 4 Millions to about ~5 (depending on the IP):
This SQL Statement:
SELECT id FROM geoip WHERE $iplong BETWEEN range_begin AND range_end
is transformed to:
SELECT id FROM geoip WHERE range_begin <= $iplong AND range_end >= $iplong
The issue is that MySQL retrieves all rows with 'range_begin <= $iplong' and then needs to scan if 'range_end >= $iplong'. This first AND condition (range_begin <= $iplong) retrieved about 2 Million rows, and all need to be checked if range_end matches.
This however can be simplified dramatically by adding one AND condition:
SELECT id FROM geoip WHERE range_begin <= $iplong AND range_begin >= $iplong-65535 AND range_end >= $iplong
The statement
range_begin <= $iplong AND range_begin >= $iplong-65535
retrieves only entries where the range_begin is between $iplong-65535 and $iplong. In my case, this reduced the number of retrieved rows from 4 Mio. to about 5 and the script runtime went down from multiple minutes to a few seconds.
Note on 65535: This is for my table the maximal distance between range_begin and range_end, i.e., (range_end-range_begin) <= 65535 for all my rows. If you have larger IP-ranges, you must increase the 65535, if you have smaller IP-ranges, you can decrease this constant. If this constant is too large (for example 4 Billion), you will not save any query time.
For this query, you only need an index on range_begin.
Upvotes: 3
Reputation: 48387
It's a very useful exercise to spend some time thinking about why a conventional index is useless in a scenario like this. Indeed if you can get the query to use the index you will find it will probably be slower than running a full table scan.
Explaining why would take more space than available here. There is a solution - which is to treat the ipaddress database as a one dimensional space and use spatial indexing. But mysql spatial indexes only work in 2 dimensions - so you need to map the coordinate into a 2 dimensional space as described here
Note that the greater than / limit method, although faster than the spatial index becomes messy when you start dealing with nested sub-nets.
Upvotes: 1