Reputation: 203
Very simple problem yet hard to find a solution. Address table with 2,498,739 rows has a field of min_ip and max_ip fields. These are the core anchors of the table for filtering.
The query is very simple.
SELECT *
FROM address a
WHERE min_ip < value
AND max_ip > value;
So it is logical to create an index for the min_ip and max_ip to make the query faster.
Index created for the following.
CREATE INDEX ip_range ON address (min_ip, max_ip) USING BTREE;
CREATE INDEX min_ip ON address (min_ip ASC) USING BTREE;
CREATE INDEX max_ip ON address (max_ip DESC) USING BTREE;
I did try to create just the first option (combination of min_ip and max_ip) but it did not work so I prepared at least 3 indexes to give MySQL more options for index selection. (Note that this table is pretty much static and more of a lookup table)
+------------------------+---------------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+---------------------+------+-----+---------------------+-----------------------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| network | varchar(20) | YES | | NULL | |
| min_ip | int(11) unsigned | NO | MUL | NULL | |
| max_ip | int(11) unsigned | NO | MUL | NULL | |
+------------------------+---------------------+------+-----+---------------------+-----------------------------+
Now, it should be straight forward to query the table with min_ip and max_ip as the filter criteria.
EXPLAIN
SELECT *
FROM address a
WHERE min_ip < 2410508496
AND max_ip > 2410508496;
The query performed something around 0.120 to 0.200 secs. However, on a load test, the query rapidly degrades performance. MySQL server CPU usage sky rocket to 100% CPU usage on just a few simultaneous queries and performance degrades rapidly and does not scale up. Slow query on mysql server was turned on with 10 secs or higher, and eventually the select query shows up in the logs just after a few seconds of load test. So I checked the query with explain and found out that it did'nt use an index.
Explain plan result
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ ---------------------- ------ ------- ------ ------- -------------
1 SIMPLE a ALL ip_range,min_ip,max_ip (NULL) (NULL) (NULL) 2417789 Using where
Interestingly, it was able to determine ip_range, ip_min and ip_max as potential indexes but never use any of it as shown in the key column. I know I can use FORCE INDEX and tried to use explain plan on it.
EXPLAIN
SELECT *
FROM address a
FORCE INDEX (ip_range)
WHERE min_ip < 2410508496
AND max_ip > 2410508496;
Explain plan with FORCE INDEX result
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ ------------- -------- ------- ------ ------- -----------------------
1 SIMPLE a range ip_range ip_range 4 (NULL) 1208894 Using index condition
With FORCE INDEX, yes it uses the ip_range index as key, and rows shows a subset from the query that does not use FORCE INDEX which is 1,208,894 from 2,417,789. So definitely, using the index should have better performance. (Unless I misunderstood the explain result)
But what is more interesting is, after a couple of test, I found out that on some instances, MySQL does use index even without FORCE INDEX. And my observation is when the value is small, it does use the index.
EXPLAIN
SELECT *
FROM address a
WHERE min_ip < 508496
AND max_ip > 508496;
Explain Result
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ ---------------------- -------- ------- ------ ------ -----------------------
1 SIMPLE a range ip_range,min_ip,max_ip ip_range 4 (NULL) 1 Using index condition
So, it just puzzled me that base on the value pass to the select query, MySQL decides when to use an index and when not to use an index. I can't imagine what is the basis for determining when to use the index on a certain value being passed to the query. I do understand that index may not be used if there is no matching index suitable in the WHERE condition but in this case, it is very clear the ip_range index which is an index based on min_ip and max_ip column is suitable for the WHERE condition in this case.
But the bigger problem I have is, what about other queries. Do I have to go and test those queries on a grand scale. But even then, as the data grows, can I rely and expect MySQL to use the index? Yes, I can always use FORCE INDEX to ensure it uses the index. But this is not standard SQL that works on all database. ORM frameworks may not be able to support FORCE INDEX syntax when they generate the SQL and it tightly couples your query with your index names.
Not sure if anyone has ever encountered this issue but this seems to be a very big problem for me.
Upvotes: 5
Views: 5305
Reputation: 161
I do agree to all the answers above. but you can try to make only one composite index like this:
create index ip_rang on address (min_ip ASC,max_ip DESC) using BTREE;
As you know index is also has the disadvantage of using your disk space so consider the optimal index for using.
Upvotes: 0
Reputation: 142298
Ranges like that are nasty to optimize. But I have a technique. It requires non-overlapping ranges and stores only a start_ip, not the end_ip (which is effectively available from the 'next' record). It provides stored routines to hide the messy code, involving ORDER BY ... LIMIT 1
and other tricks. For most operations it won't hit more than one block of data, unlike the obvious approaches that tend to fetch half or all the table.
Upvotes: 0
Reputation: 51888
Fully agree with Vatev and the others. Not only MySQL does that. Scanning the table is sometimes cheaper than looking at the index first then looking up corresponding entries on disk.
The only time when it for sure uses the index is, when it's a covering index, which means, that every column in the query (for this particular table of course) is present in the index. Meaning, if you need for example only the network column
SELECT network
FROM address a
WHERE min_ip < 2410508496
AND max_ip > 2410508496;
then a covering index like
CREATE INDEX ip_range ON address (min_ip, max_ip, network) USING BTREE;
would only look at the index as there's no need to lookup additional data on disk at all. And the whole index could be kept in memory.
Upvotes: 5