Reputation: 48432
I have a table that contains ranges of IP addresses (as integer values) and corresponding country, region and city assigned to that range. It looks as follows:
CREATE TABLE [dbo].[IpToRegion]
(
[BeginRange] [bigint] NOT NULL,
[EndRange] [bigint] NOT NULL,
[CountryCode] [varchar](10) NOT NULL,
[Country] [varchar](50) NOT NULL,
[Region] [varchar](100) NOT NULL,
[City] [varchar](100) NOT NULL
) ON [PRIMARY]
CREATE UNIQUE CLUSTERED INDEX [ClusteredIndex-20151031-193911] ON [dbo].[IpToRegion]
(
[BeginRange] ASC,
[EndRange] ASC
)
GO
There are 9.1M rows in this table. In order to find the location of a single IP address, I first convert it to a big int and then execute the following query:
DECLARE @IPNumber BIGINT
DECLARE @IPAddress varchar(20)
Set @IPNumber = (CONVERT(bigint, PARSENAME(@IPAddress,1)) + CONVERT(bigint, PARSENAME(@IPAddress,2)) * 256 + CONVERT(bigint, PARSENAME(@IPAddress,3)) * 65536 + CONVERT(bigint, PARSENAME(@IPAddress,4)) * 16777216)
Select City + ', ' + Region + ', ' + Country
From IpToRegion
Where @IPNumber Between BeginRange And EndRange
The problem is this query can take anywhere from 5 to 20 seconds to execute. Here is the query plan:
Of course, my question is how can this query take so long to execute? It's doing a seek on the clustered index and returns a single row. I could try a few different indexing strategies. But, at this point, I'm more curious as to why this query can perform so poorly.
Upvotes: 3
Views: 628
Reputation: 48432
It turns out that the clustered index I had on BeginRange + EndRange was not efficient as Vladimir Baranov stated in his answer. What I did was to create a PK / clustered index on BeginRange and a separate index on EndRange. Now the query performs instantly.
Upvotes: 0
Reputation: 32695
This kind of search can't be done efficiently with the index that you have.
If you look at details of the Index Seek
operator in the plan you'd see two predicates.
@IPNumber >= BeginRange
@IPNumber <= EndRange
Index helps to quickly O(log(n))
find the start (or end) of the range, but then it has to check the second predicate for the rest of the rows in the table.
Check out the actual number of rows read in the plan. It will be large.
If I'm not mistaken, there was somewhat similar (more complicated) question before. Even though, it was asked for Postgres, the approach would work in SQL Server as well. In that question this kind of search was done not once, but 600K times.
The answer to the question "How to make this search efficient" depends on few things. First: can you guarantee that IP ranges in your table do not overlap? In other words, can you guarantee that any search will return 0 or 1 rows?
If yes, adding a simple TOP(1)
to the query could be enough.
Upvotes: 5