Reputation: 89
Let's imagine that the table my_table
is divided into 1000 partitions as the following example:
P1, P2, P3, ... , P997, P998, P999, P1000
Partitions are organized by dates, mostly a partition per day. E.g.:
P0 < 01/01/2000 => Contains around 472M records
P1 = 01/01/2000 => Contains around 15k records
P2 = 02/01/2000 => Contains around 15k records
P3 = 03/01/2000 => Contains around 15k records
... = ../../.... => Contains around ... records
P997 = 07/04/2000 => Contains around 15k records
P998 = 08/04/2000 => Contains around 15k records
P999 = 09/04/2000 => Contains around 15k records
P1000 = 10/04/2000 => Contains around 15k records
Please notice that P0 is <
to 01/01/2000, NOT =
When looking for a specific record without knowing the date, I am doing a:
SELECT * FROM my_schema.my_table WHERE ... ;
But this take too much time because it does include P0
(30s).
So the best idea would be to execute an SQL query such as:
SELECT * FROM my_schema.my_table FROM PARTITION(P42) WHERE ... ;
But we never know in which partition is the record. We don't know either the date associated to the partition. And of course we won't loop over all partitions 1 by 1
I could be clever by doing 5 by 5:
SELECT * FROM my_schema.my_table FROM PARTITION(P40,P41,P42,P43,P44) WHERE ... ;
However same issue as above, I won't loop over all partitions, even 5 by 5
I won't run either do (excluding P0
in the list):
SELECT * FROM my_schema.my_table FROM PARTITION(P1,P2,...,P99,P100) WHERE ... ;
The query would be too long and I would have to compute for each request the list of partitions names since it could not always start by P1
or end by P100
(each days some partitions are dropped, some are created)
How can I do something like this?
SELECT * FROM my_schema.my_table NOT IN PARTITION(P0) WHERE ... ;
or
SELECT * FROM my_schema.my_table PARTITION(*,-P0) WHERE ... ;
or
SELECT * FROM my_schema.my_table LESS PARTITION(P0) WHERE ... ;
or
SELECT * FROM my_schema.my_table EXCLUDE PARTITION(P0) WHERE ... ;
I mean a way to select all partitions expect one or some of them?
Note: I don't know in advance the value of the dateofSale. Inside the table, we have something like
CREATE TABLE my_table
(
recordID NUMBER(16) NOT NULL, --not primary
dateOfSale DATE NOT NULL, --unknown
....
<other fields>
)
WHERE date > 2019
clause and index the date field for multiples reasons that would take too much time to explain here.Upvotes: 1
Views: 1513
Reputation: 36817
Although there is no syntax to exclude a specific partition, you can build a pipelined table function that dynamically builds a query that uses every partition except for one.
The table function builds a query like the one below. The function uses the data dictionary view USER_TAB_PARTITIONS
to get the partition names to build the SQL, uses dynamic SQL to execute the query, and then pipes the results back to the caller.
select * from my_table partition (P1) union all
select * from my_table partition (P2) union all
...
select * from my_table partition (P1000);
CREATE TABLE my_table
(
recordID NUMBER(16) NOT NULL, --not primary
dateOfSale DATE NOT NULL, --unknown
a NUMBER
)
partition by range (dateOfSale)
(
partition p0 values less than (date '2000-01-01'),
partition p1 values less than (date '2000-01-02'),
partition p2 values less than (date '2000-01-03')
);
insert into my_table
select 1,date '1999-12-31',1 from dual union all
select 2,date '2000-01-01',1 from dual union all
select 3,date '2000-01-02',1 from dual;
commit;
create or replace package my_table_pkg is
type my_table_nt is table of my_table%rowtype;
function get_everything_but_p0 return my_table_nt pipelined;
end;
/
create or replace package body my_table_pkg is
function get_everything_but_p0 return my_table_nt pipelined is
v_sql clob;
v_results my_table_nt;
v_cursor sys_refcursor;
begin
--Build SQL that referneces all partitions.
for partitions in
(
select partition_name
from user_tab_partitions
where table_name = 'MY_TABLE'
and partition_name <> 'P0'
) loop
v_sql := v_sql || chr(10) || 'union all select * from my_table ' ||
'partition (' || partitions.partition_name || ')';
end loop;
v_sql := substr(v_sql, 12);
--Print the query for debugging:
dbms_output.put_line(v_sql);
--Gather the results in batches and pipe them out.
open v_cursor for v_sql;
loop
fetch v_cursor bulk collect into v_results limit 100;
exit when v_results.count = 0;
for i in 1 .. v_results.count loop
pipe row (v_results(i));
end loop;
end loop;
close v_cursor;
end;
end;
/
The package uses 12c's ability to define types in package specifications. If you build this in 11g or below, you'll need to create SQL types instead. This package only works for one table, but if necessary there are ways to create functions that work with any table (using Oracle data cartridge or 18c's polymorphic table functions).
SQL> select * from table(my_table_pkg.get_everything_but_p0);
RECORDID DATEOFSAL A
---------- --------- ----------
2 01-JAN-00 1
3 02-JAN-00 1
This function should perform almost as well as the clever solution you were looking for. There will be overhead because the rows get passed through PL/SQL. But most importantly, the function builds a SQL statement that partition prunes away the large P0 partition.
One possible issue with this function is that the optimizer has no visibility inside it and can't create a good row cardinality estimate. If you use the function as part of another large SQL statement, be aware that the optimizer will blindly guess that the function returns 8168 rows. That bad cardinality guess may lead to a bad execution plan.
Upvotes: 0
Reputation: 11591
The query below, ie two queries in a UNION ALL but I only want 1 row, will stop immediately a row is found. We do not need to go into the second part of the UNION ALL if we get a row in the first.
SQL> select * from
2 ( select x
3 from t1
4 where x = :b1
5 union all
6 select x
7 from t2
8 where x = :b1
9 )
10 where rownum = 1
11 /
See https://connor-mcdonald.com/golden-oldies/first-match-written-15-10-2007/ for a simple proof of this.
I'm assuming that you're working under the assumption that most of the time, the record you are interested in is in your most recent smaller partitions. In the absence of any other information to hone on in the right partition, you could do
select * from
( select ...
from tab
where trans_dt >= DATE'2000-01-01'
and record_id = :my_record
union all
select x
from tab
where trans_dt < DATE'2000-01-01'
and record_id = :my_record
)
where rownum = 1
which will only scan the big partition if we fall through and don't find it anywhere else.
But your problem does seem to be screaming out for an index to avoid all this work
Upvotes: 1
Reputation: 21075
Let's simplify your partitioned table as follows
CREATE TABLE tab
( trans_dt DATE
)
PARTITION BY RANGE (trans_dt)
( PARTITION p0 VALUES LESS THAN (DATE'2000-01-01')
, PARTITION p1 VALUES LESS THAN (DATE'2000-01-02')
, PARTITION p2 VALUES LESS THAN (DATE'2000-01-03')
, PARTITION p3 VALUES LESS THAN (DATE'2000-01-04')
);
If you want to skip your large partition P0
in a query, you simple (as this is the first partition) constraints the partition key as trans_dt >= DATE'2000-01-01'
You will need two predicates and or
to skip a partition in the middle
The query
select * from tab
where trans_dt >= DATE'2000-01-01';
Checking the execution plan you see the expected behaviour in Pstart = 2
(i.e. the 1st partition is pruned).
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ITERATOR | | 1 | 9 | 2 (0)| 00:00:01 | 2 | 4 |
| 2 | TABLE ACCESS STORAGE FULL| TAB | 1 | 9 | 2 (0)| 00:00:01 | 2 | 4 |
---------------------------------------------------------------------------------------------------
Remember, if you scan a partitioned table without constraining the partition key you will have to scall all partitions.
If you know, that most of the query results are in the recent and small partitions, simple scan tme in the first query
select * from tab
where trans_dt >= DATE'2000-01-01' and <your filter>
and only if you fail to get the row scan the large partition
select * from tab
where trans_dt < DATE'2000-01-01' and <your filter>
You will get much better response time on average if the assumption is true that the queries refer mostly the recent data.
Upvotes: 0