Avoiding multiple calculation for the same values

Is there a way that I can improve the performance of this query by avoiding multiple calculation of some repeating values like

regexp_replace(my_source_file, 'TABLE_NAME_|_AM|_PM|TO|.csv|\d+:\d{2}:\d{2}', '')

? and

to_date(regexp_replace(regexp_replace(my_source_file, 'TABLE_NAME_|_AM|_PM|_TO_|\.csv|\d+:\d{2}:\d{2}', ''), '^(\d{4}-\d+-\d+)_.+$', '\1'), 'YYYY-MM-DD')

This columns are calculated 3 and 2 time and I runned some test and only by removing the date_start column the query performance improved by approx 20 seconds. I am thinking if oracle provides a better way to retain the values and avoid multiple calculations would be great. Also I would want to avoid

The Actual QUERY:

select * 
from (
  select
      row_number() over (partition by DCRAINTERNALNUMBER, ISSUE_DATE, PERMIT_ID order by to_date(regexp_replace(regexp_replace(my_source_file, 'TABLE_NAME_|_AM|_PM|_TO_|\.csv|\d+:\d{2}:\d{2}', ''), '^.+_(\d{4}-\d+-\d+)_$', '\1'), 'YYYY-MM-DD') desc) as row_order,
      to_date(regexp_replace(regexp_replace(my_source_file, 'TABLE_NAME_|_AM|_PM|_TO_|\.csv|\d+:\d{2}:\d{2}', ''), '^(\d{4}-\d+-\d+)_.+$', '\1'), 'YYYY-MM-DD') as date_start,
      to_date(regexp_replace(regexp_replace(my_source_file, 'TABLE_NAME_|_AM|_PM|_TO_|\.csv|\d+:\d{2}:\d{2}', ''), '^.+_(\d{4}-\d+-\d+)_$', '\1'), 'YYYY-MM-DD') as date_end,
      temp2.* 
  from schema.TABLE_NAME temp2
) t

I also tried to simulate something like this to avoid multiple calculations but due to all the nested select statements it doesn't improve anything... It makes the query slower with approx 25 seconds:

select * 
from (
  select row_number() over (partition by DCRAINTERNALNUMBER, ISSUE_DATE, PERMIT_ID order by date_end desc) as row_order,
  temp1.*
  from (
      select to_date(regexp_replace(date_raw, '^(\d{4}-\d+-\d+)_.+$', '\1'), 'YYYY-MM-DD') as date_start,
          to_date(regexp_replace(date_raw, '^.+_(\d{4}-\d+-\d+)_$', '\1'), 'YYYY-MM-DD') as date_end,
      temp2.* 
      from (
      select regexp_replace(my_source_file, 'TABLE_NAME_|_AM|_PM|_TO_|\.csv|\d+:\d{2}:\d{2}', '') as date_raw,
        temp3.*
        from schema.TABLE_NAME temp3
      ) temp2

  ) temp1
) t

Upvotes: 0

Answers (3)

David Aldridge

Reputation: 52386

If this were a deterministic PL/SQL function with few input values, I'd certainly try changing:

select expensive_function(some_value)
from   large_table;

... to ...

select (select expensive_function(some_value) from dual)
from   large_table;

... as Oracle has a caching mechanism for that. You might give it a go just pushing the SQL expressions into subqueries, and if that doesn't work (or even if it does) I'd move that SQL code into a PL/SQL function and try it.

Oh, if you're on 11g of course I'd use pl/sql function caching directly: http://www.oracle.com/technetwork/issue-archive/2007/07-sep/o57asktom-101814.html

Upvotes: 2

Dave

Reputation: 151

It doesn't answer your specific question about reducing the number of calls to the function but have you considered using regexp_substr rather than multiple calls to the regexp_replace function? I would think this would do less work and should be quicker. It should also be less likely to give you an exception if the data doesn't quite match (e.g. if the file name is .txt instead of .csv)

Something like...

select * 
from (
  select row_number() over (partition by DCRAINTERNALNUMBER, ISSUE_DATE, PERMIT_ID order by to_date(regexp_substr(my_source_file,'\d{4}-\d{1,2}-\d{1,2}'),'yyyy-mm-dd') desc) as row_order,
  to_date(regexp_substr(my_source_file,'\d{4}-\d{1,2}-\d{1,2}'),'yyyy-mm-dd') as date_start,
  to_date(regexp_substr(my_source_file,'\d{4}-\d{1,2}-\d{1,2}',1,2),'yyyy-mm-dd') as date_end,
  temp2.* 
from schema.TABLE_NAME temp2) t

If I have interpreted your data properly

TABLE_NAME_2011-3-1_11:00:00_AM_TO_2013-4-24_12:00:00_AM.csv

In this 2011-3-1 is the start date and 2013-4-24 is the end date. I can get these into a date by using the same pattern matching but picking the first instance for start date (no parameters are required as this is the default) and the second instance for the end date (this requires the extra ,1,2 for the substr to start at the beginning (character 1) and pick the second instance.

Hope that helps.

Upvotes: 1

A.B.Cade

Reputation: 16915

Not sure that will improve much, but you can use a CTE:

with cte as (
  select tt.*, to_date(regexp_replace(regexp_replace(my_source_file, 'TABLE_NAME_|_AM|_PM|_TO_|\.csv|\d+:\d{2}:\d{2}', ''), '^.+_(\d{4}-\d+-\d+)_$', '\1'), 'YYYY-MM-DD') calc_val
  from schema.TABLE_NAME tt
)
select * 
from (
  select
      row_number() over (partition by DCRAINTERNALNUMBER, ISSUE_DATE, PERMIT_ID order by calc_val desc) as row_order,
      calc_val as date_start,
      calc_val as date_end,
      temp2.* 
  from cte temp2
) t

Upvotes: 0

Avoiding multiple calculation for the same values

Answers (3)

Related Questions