Keith Sorbo
Keith Sorbo

Reputation: 25

Speeding up Between Search

I am using MaxMind free databases to do IP lookups. I convert the data to the following table:

CREATE TABLE [dbo].[GeoBlocks](
    [StartIPNum] [varchar](50) NULL,
    [EndIPNumb] [varchar](50) NULL,
    [LocationNum] [varchar](50) NULL,
    [PostalCode] [varchar](50) NULL,
    [Latitude] [varchar](50) NULL,
    [Longitude] [varchar](50) NULL)

There are about 3.5M records in this lookup table.

My goal is to determine the LocationNum for an IP(decimal form) by finding the record where the IP is between StartIPNum and EndIPNum

My stored procedure looks like this: Parameter: @DecimalIP bigint

select GeoBlocks.StartIPNum ,@DecimalIP as DecimalIp 
     ,GeoBlocks.Postalcode ,GeoBlocks.Latitude as Latitude
     ,GeoBlocks.Longitude as Longitude
     from GeoBlocks
    where @DecimalIP between GeoBlocks.StartIPNum  and GeoBlocks.EndIPNumb

I have created unique indexes on StartIPNum and EndIPNum.

However, when I run this, SQL server does a table scan for the Where portion of the query. This query takes 650-750ms. (Most queries on my server take 0-2ms)

How do I speed up this query?

Added Sample Data:

StartIPNum  EndIPNumb   LocationNum PostalCode  Latitude    Longitude
1350218632  1350218639  2782113                  48.2000    16.3667
1350218640  1350218655  2782113                  48.2000    16.3667
1350218656  1350218687  2782113                  48.2000    16.3667
1350218688  1350218751  2782113                  48.2000    16.3667
1350218752  1350218783  2782113                  48.2000    16.3667

Upvotes: 0

Views: 75

Answers (2)

HABO
HABO

Reputation: 15816

Update:

To summarize information scattered among various comments:

  1. The IP address columns are VarChar(50) strings containing decimal values without left padding. An index on those columns will sort them alphabetically, not numerically, i.e. "10" < "2". (With left padding the sort will be correct numerically as well: "10" > "02".)

  2. The WHERE clause (where @DecimalIP between GeoBlocks.StartIPNum and GeoBlocks.EndIPNumb) uses mixed datatypes. @DecimalIP is a BIGINT while the two columns are VarChar(50). SQL handles operations among mixed datatypes by implementing a data type precedence scheme. (Ref.) This causes the IP addresses in each row to be converted from strings to BIGINT values, hence the comparison is done numerically and the "expected" results are returned at a considerable cost. The indexes are (all but) useless in this case.

  3. Changing the columns to BIGINT will allow the use of an index to improve performance and ensure that comparisons are done numerically rather than alphabetically. An single index containing both the StartIPNum and EndIPNumb columns will greatly improve performance. Note that if overlapping address ranges are not allowed then the index will effectively be unique on StartIPNum and could be replaced with an index on StartIPNum with EndIPNumb as an included column for performance.

Original Answer:

If you are using IPV4 addresses in dotted notation, e.g. "192.168.0.42", you can convert the strings into BIGINT values with this UDF:

create function [dbo].[IntegerIPV4Address]( @IPV4Address VarChar(16) )
  returns BigInt
  with SchemaBinding
  begin
  declare @Dot1 as Int = CharIndex( '.', @IPV4Address );
  declare @Dot2 as Int = CharIndex( '.', @IPV4Address, @Dot1 + 1 );
  declare @Dot3 as Int = CharIndex( '.', @IPV4Address, @Dot2 + 1 );
  return Cast( Substring( @IPV4Address, 0, @Dot1 ) as BigInt ) * 0x1000000 +
    Cast( Substring( @IPV4Address, @Dot1 + 1, @Dot2 - @Dot1 - 1 ) as BigInt ) * 0x10000 +  
    Cast( Substring( @IPV4Address, @Dot2 + 1, @Dot3 - @Dot2 - 1 ) as BigInt ) * 0x100 +
    Cast( Substring( @IPV4Address, @Dot3 + 1, Len( @IPV4Address ) * 1 ) as BigInt );
  end

You can either store the integer values or create an index on a computed column based on the functions result. Note that you need to change your query to reference the integer column in the WHERE clause.

If you store the values as integers the following function will convert them back to normalized strings where each part of the address is three digits. These values can be used in comparisons since they will sort the same way both alphabetically and numerically.

create function [dbo].[NormalizedIPV4Address]( @IntegerIPV4Address as BigInt )
  returns VarChar(16)
  with SchemaBinding -- Deterministic function.
  begin
  declare @BinaryAddress as VarBinary(4) = Cast( @IntegerIPV4Address as VarBinary(4) );
  return Right( '00' + Cast( Cast( Substring( @BinaryAddress, 1, 1 ) as Int ) as VarChar(3) ), 3 ) +
    '.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 2, 1 ) as Int ) as VarChar(3) ), 3 ) +
    '.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 3, 1 ) as Int ) as VarChar(3) ), 3 ) +
    '.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 4, 1 ) as Int ) as VarChar(3) ), 3 )
  end

You could round-trip the string values in your table to get them all into "normalized" form so that they sort correctly by using both functions. Not an ideal solution since it requires that all future inserts and updates be normalized, but it may help for the moment.

Upvotes: 1

SQL Police
SQL Police

Reputation: 4196

I guess that the index is not properly set. You can improve it like this:

  1. Go into SQL Server Management Studio, and open a new query window
  2. Select from Menu: Query->Include Actual Execution Plan (Ctrl+M)
  3. Type your query and execute it

Now the query will execute, and the execution plan will be displayed. And if the index is bad, it will show you a hint which index is missing and what you can do.

It will display a precise SQL statement which you need to create the index. Just copy + paste that statement and execute it, and then your index should work.

Upvotes: 0

Related Questions