user12642493
user12642493

Reputation: 89

Perform an SQL query excluding a specific partition name

Let's imagine that the table my_table is divided into 1000 partitions as the following example:

P1, P2, P3, ... , P997, P998, P999, P1000

Partitions are organized by dates, mostly a partition per day. E.g.:

P0    < 01/01/2000 => Contains around 472M records
P1    = 01/01/2000 => Contains around  15k records
P2    = 02/01/2000 => Contains around  15k records
P3    = 03/01/2000 => Contains around  15k records
...   = ../../.... => Contains around  ... records
P997  = 07/04/2000 => Contains around  15k records
P998  = 08/04/2000 => Contains around  15k records
P999  = 09/04/2000 => Contains around  15k records
P1000 = 10/04/2000 => Contains around  15k records

Please notice that P0 is < to 01/01/2000, NOT =

CURRENT SITUATION:

When looking for a specific record without knowing the date, I am doing a:
SELECT * FROM my_schema.my_table WHERE ... ;
But this take too much time because it does include P0 (30s).

IMPOSSIBLE SOLUTION:

So the best idea would be to execute an SQL query such as:
SELECT * FROM my_schema.my_table FROM PARTITION(P42) WHERE ... ;
But we never know in which partition is the record. We don't know either the date associated to the partition. And of course we won't loop over all partitions 1 by 1

BAD SOLUTION:

I could be clever by doing 5 by 5:
SELECT * FROM my_schema.my_table FROM PARTITION(P40,P41,P42,P43,P44) WHERE ... ;
However same issue as above, I won't loop over all partitions, even 5 by 5

LESS BAD SOLUTION:

I won't run either do (excluding P0 in the list):
SELECT * FROM my_schema.my_table FROM PARTITION(P1,P2,...,P99,P100) WHERE ... ;
The query would be too long and I would have to compute for each request the list of partitions names since it could not always start by P1 or end by P100 (each days some partitions are dropped, some are created)

CLEVER SOLUTION (but does it exist?):

How can I do something like this?
SELECT * FROM my_schema.my_table NOT IN PARTITION(P0) WHERE ... ;
or
SELECT * FROM my_schema.my_table PARTITION(*,-P0) WHERE ... ;
or
SELECT * FROM my_schema.my_table LESS PARTITION(P0) WHERE ... ;
or
SELECT * FROM my_schema.my_table EXCLUDE PARTITION(P0) WHERE ... ;

Is there any way to do that?

I mean a way to select all partitions expect one or some of them? Note: I don't know in advance the value of the dateofSale. Inside the table, we have something like

CREATE TABLE my_table 
(
    recordID NUMBER(16) NOT NULL,     --not primary
    dateOfSale DATE NOT NULL,         --unknown
    ....
    <other fields>
)

Before you answer, read the following:

  1. Index usage: yes, it is already optimized, but remember, we do not know the partitioning date
  2. No we won't drop records in P0, we need to keep them for at least few years (3, 5 and sometimes 10 according each country laws)
  3. We can "split" P0 into several partitions, but that won't solve the issue with a global SELECT
  4. We cannot move that data into a new table, we need them to be kept in this table since we have multiple services and softwares performing select in it. We would have to edit to much code to add a query for the second table for each services and back-end.
  5. We cannot do an aka WHERE date > 2019 clause and index the date field for multiples reasons that would take too much time to explain here.

Upvotes: 1

Views: 1513

Answers (3)

Jon Heller
Jon Heller

Reputation: 36817

Although there is no syntax to exclude a specific partition, you can build a pipelined table function that dynamically builds a query that uses every partition except for one.

The table function builds a query like the one below. The function uses the data dictionary view USER_TAB_PARTITIONS to get the partition names to build the SQL, uses dynamic SQL to execute the query, and then pipes the results back to the caller.

select * from my_table partition (P1) union all
select * from my_table partition (P2) union all
...
select * from my_table partition (P1000);

Sample schema

CREATE TABLE my_table 
(
    recordID NUMBER(16) NOT NULL,     --not primary
    dateOfSale DATE NOT NULL,         --unknown
    a NUMBER
)
partition by range (dateOfSale)
(
    partition p0 values less than (date '2000-01-01'),
    partition p1 values less than (date '2000-01-02'),
    partition p2 values less than (date '2000-01-03')
);

insert into my_table
select 1,date '1999-12-31',1 from dual union all
select 2,date '2000-01-01',1 from dual union all
select 3,date '2000-01-02',1 from dual;

commit;

Package and function

create or replace package my_table_pkg is
    type my_table_nt is table of my_table%rowtype;
    function get_everything_but_p0 return my_table_nt pipelined;
end;
/

create or replace package body my_table_pkg is
    function get_everything_but_p0 return my_table_nt pipelined is
        v_sql clob;

        v_results my_table_nt;
        v_cursor sys_refcursor;
    begin
        --Build SQL that referneces all partitions.
        for partitions in
        (
            select partition_name
            from user_tab_partitions
            where table_name = 'MY_TABLE'
                and partition_name <> 'P0'
        ) loop
            v_sql := v_sql || chr(10) || 'union all select * from my_table ' ||
                'partition (' || partitions.partition_name || ')';
        end loop;

        v_sql := substr(v_sql, 12);

        --Print the query for debugging:
        dbms_output.put_line(v_sql);

        --Gather the results in batches and pipe them out.
        open v_cursor for v_sql;

        loop
            fetch v_cursor bulk collect into v_results limit 100;
            exit when v_results.count = 0;
            for i in 1 .. v_results.count loop
                pipe row (v_results(i));
            end loop;
        end loop;

        close v_cursor;
    end;
end;
/

The package uses 12c's ability to define types in package specifications. If you build this in 11g or below, you'll need to create SQL types instead. This package only works for one table, but if necessary there are ways to create functions that work with any table (using Oracle data cartridge or 18c's polymorphic table functions).

Sample query

SQL> select * from table(my_table_pkg.get_everything_but_p0);

  RECORDID DATEOFSAL          A
---------- --------- ----------
         2 01-JAN-00          1
         3 02-JAN-00          1

Performance

This function should perform almost as well as the clever solution you were looking for. There will be overhead because the rows get passed through PL/SQL. But most importantly, the function builds a SQL statement that partition prunes away the large P0 partition.

One possible issue with this function is that the optimizer has no visibility inside it and can't create a good row cardinality estimate. If you use the function as part of another large SQL statement, be aware that the optimizer will blindly guess that the function returns 8168 rows. That bad cardinality guess may lead to a bad execution plan.

Upvotes: 0

Connor McDonald
Connor McDonald

Reputation: 11591

The query below, ie two queries in a UNION ALL but I only want 1 row, will stop immediately a row is found. We do not need to go into the second part of the UNION ALL if we get a row in the first.

SQL> select * from
  2  ( select x
  3    from   t1
  4    where  x = :b1
  5    union all
  6    select x
  7    from   t2
  8    where  x = :b1
  9  )
 10  where rownum = 1
 11  /

See https://connor-mcdonald.com/golden-oldies/first-match-written-15-10-2007/ for a simple proof of this.

I'm assuming that you're working under the assumption that most of the time, the record you are interested in is in your most recent smaller partitions. In the absence of any other information to hone on in the right partition, you could do

select * from
    ( select ...
      from   tab
      where  trans_dt >= DATE'2000-01-01'
      and    record_id = :my_record
      union all
      select x
      from   tab
      where  trans_dt < DATE'2000-01-01'
      and    record_id = :my_record
    )
where rownum = 1

which will only scan the big partition if we fall through and don't find it anywhere else.

But your problem does seem to be screaming out for an index to avoid all this work

Upvotes: 1

Marmite Bomber
Marmite Bomber

Reputation: 21075

Let's simplify your partitioned table as follows

CREATE TABLE tab
  (  trans_dt       DATE
  )
 PARTITION BY RANGE (trans_dt)
 ( PARTITION p0 VALUES LESS THAN (DATE'2000-01-01')
 , PARTITION p1 VALUES LESS THAN (DATE'2000-01-02')
 , PARTITION p2 VALUES LESS THAN (DATE'2000-01-03')
 , PARTITION p3 VALUES LESS THAN (DATE'2000-01-04')
 );

If you want to skip your large partition P0 in a query, you simple (as this is the first partition) constraints the partition key as trans_dt >= DATE'2000-01-01'

You will need two predicates and or to skip a partition in the middle

The query

 select * from tab
 where trans_dt >= DATE'2000-01-01';

Checking the execution plan you see the expected behaviour in Pstart = 2(i.e. the 1st partition is pruned).

---------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |     9 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR  |      |     1 |     9 |     2   (0)| 00:00:01 |     2 |     4 |
|   2 |   TABLE ACCESS STORAGE FULL| TAB  |     1 |     9 |     2   (0)| 00:00:01 |     2 |     4 |
---------------------------------------------------------------------------------------------------

Remember, if you scan a partitioned table without constraining the partition key you will have to scall all partitions.

If you know, that most of the query results are in the recent and small partitions, simple scan tme in the first query

 select * from tab
 where trans_dt >= DATE'2000-01-01' and <your filter>

and only if you fail to get the row scan the large partition

 select * from tab
 where trans_dt < DATE'2000-01-01' and <your filter>

You will get much better response time on average if the assumption is true that the queries refer mostly the recent data.

Upvotes: 0

Related Questions