Raj More
Raj More

Reputation: 48024

Efficient way to query for the existence of data in a partitioned table

I am in Oracle 11G Enterprise Edition 11.2.0.4.0

I have a table that has about 12M rows per partition. The partitioning is by SnapshotDate.

I need to evaluate whether the last 15 days worth of snapshots have any data at all.

The most found answer online tells me to use Row_Number() Over (Partition By SnapshotDate Order By SnapshotDate)`. Here's the code I came up with (it only returns the dates that have values so far, so I need a left join with my calendar table, of course):

;With OneDateAllDates As
(
  /* 
      partition by snapshot date so that numbering starts over again
      i have to use an order by - it gave me an error without one
  */
  Select SnapshotDate, 1 HasData, Row_Number() Over (Partition By SnapshotDate Order By SnapshotDate) RowNumber
  From FactTable
  Where SnapshotDate IN
  (
    /* any mechanism that gives me the last 10 calendar days will do*/
    Select CalendarTimeId
    From DimCalendar
    Where CalendarDate Between To_Date ('20161208', 'yyyymmdd') - 15 And To_Date ('20161208', 'yyyymmdd')
  )
)
Select *
From AllDates
Where RowNumber = 1;

However, ordering 12M rows over 15 days is insanely expensive - I'm sorting 180 million rows to get back 15 rows. Here is my desired output:

Date          HasData
===========   =======
12/08/2016    1
12/07/2016    1
12/06/2016    0
12/05/2016    0
12/04/2016    1
12/03/2016    0
12/02/2016    1
12/01/2016    0    
etc etc

Is there a more efficient way to write such a query?

Upvotes: 3

Views: 256

Answers (1)

Jon Heller
Jon Heller

Reputation: 36832

I don't think there's a clean way to combine partition pruning and Top N reporting. The below code is ugly and repetitive but it gets the job done quickly.

It reads each of the last 15 daily partitions but the rownum = 1 makes it a quick read. The date can be replaced by a bind variable, but the numbers 0 through 15 have to be hard-coded. If you need a variable number of days, you can hard-code dozens or hundreds of subqueries, and then filter them out later with another bind variable. Running hundreds of subqueries is not ideal but it would still be much faster than reading 180 million rows.

Query

select CalendarDate, nvl(has_data, 0) has_data
from
(
    --The last 15 days.
    Select CalendarDate
    From DimCalendar
    Where CalendarDate Between To_Date ('20161208', 'yyyymmdd') - 15 And To_Date ('20161208', 'yyyymmdd')
) last_15_days
left join
(
    --The last 15 days of data, if any.
    select date '2016-12-08' - 0 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 0) and rownum = 1 union all
    select date '2016-12-08' - 1 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 1) and rownum = 1 union all
    select date '2016-12-08' - 2 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 2) and rownum = 1 union all
    select date '2016-12-08' - 3 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 3) and rownum = 1 union all
    select date '2016-12-08' - 4 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 4) and rownum = 1 union all
    select date '2016-12-08' - 5 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 5) and rownum = 1 union all
    select date '2016-12-08' - 6 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 6) and rownum = 1 union all
    select date '2016-12-08' - 7 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 7) and rownum = 1 union all
    select date '2016-12-08' - 8 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 8) and rownum = 1 union all
    select date '2016-12-08' - 9 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 9) and rownum = 1 union all
    select date '2016-12-08' - 10 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 10) and rownum = 1 union all
    select date '2016-12-08' - 11 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 11) and rownum = 1 union all
    select date '2016-12-08' - 12 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 12) and rownum = 1 union all
    select date '2016-12-08' - 13 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 13) and rownum = 1 union all
    select date '2016-12-08' - 14 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 14) and rownum = 1 union all
    select date '2016-12-08' - 15 the_date, 1 has_data from FactTable where SnapshotDate in (Select CalendarDate From DimCalendar Where CalendarDate = date '2016-12-08' - 15) and rownum = 1
) data_from_last_15_days
    on last_15_days.CalendarDate = data_from_last_15_days.the_date
order by CalendarDate desc;

Test Schema

create table FactTable
(
    id number,
    SnapshotDate date
) nologging
partition by range (SnapshotDate)
interval (interval '1' day)
(
    partition p1 values less than (date '2000-01-01')
);

create table DimCalendar
(
    CalendarDate date
);

--Add last year into calendar.
insert into DimCalendar
select date '2016-01-01' + (level - 1)
from dual
connect by level <= 365;


--Insert 1.2 million rows per day.
begin
    for i in 1 .. 15 loop
        insert /*+ append */ into facttable select level, date '2016-12-01' + i from dual connect by level <= 1200000;
        commit;
    end loop;
end;
/

Upvotes: 2

Related Questions