RC.
RC.

Reputation: 28267

Efficient retrieval of overlapping IP range records via a single point

I have a table with millions of IP range records (start_num, end_num respectively) which I need to query via a single IP address in order to return all ranges which overlap that point. The query is essentially:

SELECT start_num
       , end_num
       , other_data_col 
FROM ip_ranges 
WHERE :query_ip BETWEEN start_num and end_num;

The table has 8 range partitions on start_num and has a local composite index on (start_num, end_num). Call it UNQ_RANGE_IDX. Statistics have been gathered on the table and index.

The query does an index range scan on the UNQ_RANGE_IDX index as expected and in some cases performs very well. The cases where it performs well are toward the bottom of the IP address space (i.e. something like 4.4.10.20) and performance is poor when at the upper end. (i.e. 200.2.2.2) I'm sure that the problem resides in the fact that on the lower end, the optimizer can prune all the partitions above the one that contains the applicable ranges due to the range partitioning on start_num providing the information necessary to prune. When querying on the top end of the IP spectrum, it can't prune the lower partitions and therefore it incurs the I/O of reading the additional index partitions. This can be verified via the number of CR_BUFFER_GETS when tracing the execution.

In reality, the ranges satisfying the query won't be in any partition but the one the query_ip is located in or the one immediately below or above it as the range size won't be greater than an A class and each partition covers many A classes each. I can make Oracle use that piece of information by specifying it in the where clause, but is there a way to convey this type of information to Oracle via stats, histograms, or a custom/domain index? It seems that there would be a common solution/approach to this sort of problem when searching for date ranges that cover a specific date as well.

I'm looking for solutions that use Oracle and its functionality to tackle this problem, but other solution types are appreciated. I've thought of a couple methods outside the scope of Oracle that would work, but I'm hoping for a better means of indexing, statistics gathering, or partitioning that will do the trick.

Requested Info:

CREATE TABLE IP_RANGES (
    START_NUM NUMBER NOT NULL, 
    END_NUM   NUMBER NOT NULL,
    OTHER     NUMBER NOT NULL,
    CONSTRAINT START_LTE_END CHECK (START_NUM <= END_NUM)
)
PARTITION BY RANGE(START_NUM)
(
    PARTITION part1 VALUES LESS THAN(1090519040) TABLESPACE USERS,
    PARTITION part2 VALUES LESS THAN(1207959552) TABLESPACE USERS
    ....<snip>....
    PARTITION part8 VALUES LESS THAN(MAXVALUE) TABLESPACE USERS
);

CREATE UNIQUE INDEX IP_RANGES_IDX ON IP_RANGES(START_NUM, END_NUM, OTHER) LOCAL NOLOGGING;

ALTER TABLE IP_RANGES ADD CONSTRAINT PK_IP_RANGE 
PRIMARY KEY(START_NUM, END_NUM, OTHER) USING INDEX IP_RANGES_IDX;

There is nothing special about the cutoff values selected for the range partitions. They are simply A class addresses where the number of ranges per partition would equate to about 1M records.

Upvotes: 5

Views: 1573

Answers (6)

Adam Musch
Adam Musch

Reputation: 13583

Your existing partitioning doesn't work, because Oracle is accessing the table's local index partitions by start_num, and it's got to check each one where there could be a match.

A different solution, assuming no ranges span a class A, would be to list partition by trunc(start_num / power(256,3)) -- the first octet. It may be worth breaking it out into a column (populated via trigger) and adding that as a filtering column to your query.

Your ~10m rows would then be, assuming an even distribution, be spread out into about 40k rows, which might be a lot faster to read through.


I ran the use case discussed below, assuming that no range spans a class A network.

create table ip_ranges
 (start_num         number           not null, 
  end_num           number           not null, 
  start_first_octet number           not null,
   ...
  constraint start_lte_end check (start_num <= end_num), 
  constraint check_first_octet check (start_first_octet = trunc(start_num / 16777216) )
)
partition by list ( start_first_octet )
(
partition p_0 values (0),
partition p_1 values (1),
partition p_2 values (2),
...
partition p_255 values (255)
);

-- run data population script, ordered by start_num, end_num

create index ip_ranges_idx01 on ip_ranges (start_num, end_num) local;

begin 
  dbms_stats.gather_table_stats (ownname => user, tabname => 'IP_RANGES', cascade => true);
end;
/

Using the base query above still performs poorly, as it's unable to do effective partition elimination:

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 | 25464 |  1840K|   845   (1)| 00:00:05 |       |       |
|   1 |  PARTITION LIST ALL                |                 | 25464 |  1840K|   845   (1)| 00:00:05 |     1 |   256 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| IP_RANGES       | 25464 |  1840K|   845   (1)| 00:00:05 |     1 |   256 |
|*  3 |    INDEX RANGE SCAN                | IP_RANGES_IDX01 |   825 |       |   833   (1)| 00:00:05 |     1 |   256 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("END_NUM">=TO_NUMBER(:IP_ADDR) AND "START_NUM"<=TO_NUMBER(:IP_ADDR))
       filter("END_NUM">=TO_NUMBER(:IP_ADDR))


Statistics
----------------------------------------------------------
         15  recursive calls
          0  db block gets
     141278  consistent gets
      94469  physical reads
          0  redo size
       1040  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

However, if we add the condition to allow Oracle to focus on a single partition, it makes a huge difference:

SQL> select * from ip_ranges
  2   where :ip_addr between start_num and end_num
  3     and start_first_octet = trunc(:ip_addr / power(256,3));

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 |   183 | 13542 |   126   (2)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE             |                 |   183 | 13542 |   126   (2)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| IP_RANGES       |   183 | 13542 |   126   (2)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                | IP_RANGES_IDX01 |     3 |       |   322   (1)| 00:00:02 |   KEY |   KEY |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("END_NUM">=TO_NUMBER(:IP_ADDR) AND "START_NUM"<=TO_NUMBER(:IP_ADDR))
       filter("END_NUM">=TO_NUMBER(:IP_ADDR))


Statistics
----------------------------------------------------------
         15  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       1040  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Upvotes: 0

Gary Myers
Gary Myers

Reputation: 35401

I suggest you turn your 8 million row table into a bigger table. Google's IP (for me, at the moment) is coming up as

"66.102.011.104"

You store one record as "66.102.011" with the respective range(s) that it falls in. In fact you store at least one record for every "aaa.bbb.ccc". You'll probably end up with a table maybe five times as big, but one you can can pin-point the relevant record with just a few logical IOs each time rather than the hundreds/thousands for a partition scan.

I suspect any data you have is going to be a little out of date anyway (as various authorities around the world issue/re-issue ranges), so regenerating adjustments for that table on a daily/weekly basis shouldn't be a big problem.

Upvotes: 1

Gary Myers
Gary Myers

Reputation: 35401

Firstly, what is your performance requirement ?

Your partitions have a definite start value and end values which can be determined from ALL_PARTITIONS (or hard-coded) and used in a function (concept below but you'd need to amend it to go one partition forward/back).

You should then be able to code

SELECT * FROM ip_ranges
WHERE :query_ip BETWEEN start_num and end_num
AND start_num between get_part_start(:query_ip) and get_part_end(:query_ip);

Which should be able to lock it down to specific partition(s). However if, as you suggest, you can only lock it down to three out of eight partitions, that is still going to be a BIG scan. I'm posting another, more radical answer as well which may be more appropriate.

create or replace function get_part_start (i_val in number) 
                              return number deterministic is
  cursor c_1 is 
    select high_value from all_tab_partitions
    where table_name = 'IP_RANGES'
    order by table_owner, table_name;
  type tab_char is table of varchar2(20) index by pls_integer;
  type tab_num is table of number index by pls_integer;
  t_char  tab_char;
  t_num   tab_num;
  v_ind   number;
begin
  open c_1;
  fetch c_1 bulk collect into t_char;
  close c_1;
  --
  for i in 1..t_char.last loop
    IF t_char(i) != 'MAXVALUE' THEN
      t_num(to_number(t_char(i))) := null;
    END IF;
  end loop;
  --
  IF i_val > t_num.last then
    return t_num.last;
  ELSIF i_val < t_num.first then
    return 0;
  END IF;
  v_ind := 0;
  WHILE i_val >= t_num.next(v_ind) loop
    v_ind := t_num.next(v_ind);
    exit when v_ind is null;
  END LOOP;
  return v_ind;
end;
/

Upvotes: 0

ErikE
ErikE

Reputation: 50271

Would you please indicate if there are any uniform or ordered characteristics of your IP ranges? For example, I would normally expect IP ranges to lie on power-of-2 boundaries. Is that the case here, so we can assume that all ranges have an implicit net mask that starts with m ones followed by n zeroes where m + n = 32?

If so, there should be a way to exploit this knowledge and "step" into the ranges. Would it be possible to add an index on a calculated value with the count of the masked bits (0-32) or maybe the block size (1 to 2^32)?

32 seeks from masks 0 to 32 using just the start_num would be faster than a scan using BETWEEN start_num AND end_num.

Also, have you considered bit arithmetic as a possible means of checking for matches (again only if the ranges represent evenly-positioned chunks in power-of-2 sizes).

Upvotes: 0

Baski
Baski

Reputation: 839

The problem I see is Local Partitioned Index and as you said looks like Oracle doesn't do prune the partition list efficiently. Can you try with Global Index? Local partitioned index doesn't scale well for OLTP queries. In our environment we don't use any Local partitioned index.

Upvotes: 0

Adam Musch
Adam Musch

Reputation: 13583

I've had a similar problem in the past; the advantage I had was that my ranges were distinct. I've got several IP_RANGES tables, each for a specific context, and the largest is ~10 million or so records, unpartitioned.

Each of the tables I have is index-organized, with the primary key being (END_NUM, START_NUM). I've also got a unique index on (START_NUM, END_NUM), but it's not used in this case.

Using a random IP address (1234567890), your query takes about 132k consistent gets.

The query below returns in between 4-10 consistent gets (depending on IP) on 10.2.0.4.

select *
  from ip_ranges outr
 where :ip_addr between outr.num_start and outr.num_end
   and outr.num_end = (select /*+ no_unnest */
                              min(innr.num_end)
                             from ip_ranges innr
                            where innr.num_end >= :ip_addr);
---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |     1 |    70 |     6   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN             | IP_RANGES_PK      |     1 |    70 |     3   (0)| 00:00:01 |
|   2 |   SORT AGGREGATE              |                   |     1 |     7 |            |          |
|   3 |    FIRST ROW                  |                   |   471K|  3223K|     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN (MIN/MAX)| IP_RANGES_PK      |   471K|  3223K|     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OUTR"."NUM_END"= (SELECT /*+ NO_UNNEST */ MIN("INNR"."NUM_END") FROM
              "IP_RANGES" "INNR" WHERE "INNR"."NUM_END">=TO_NUMBER(:IP_ADDR)) AND
              "OUTR"."NUM_START"<=TO_NUMBER(:IP_ADDR))
       filter("OUTR"."NUM_END">=TO_NUMBER(:IP_ADDR))
   4 - access("INNR"."NUM_END">=TO_NUMBER(:IP_ADDR))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        968  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The NO_UNNEST hint is key; it tells Oracle to run that subquery once, not once for each row, and it gives an equality test for the index to use in the outer query.

Upvotes: 2

Related Questions