SausageBuscuit
SausageBuscuit

Reputation: 1276

Refactor Oracle stored procedure to use BULK COLLECT

I have a database view that tracks hourly data in an Oracle 12c database. The data is arranged like this, with columns for each hour of the day.

SALES_DATE  | LOCATION_CODE | HE1_SALES | HE2_SALES | ... | HE24_SALES
_________________________________________________________________________
12/27/2019  | ABCD          |    40     |    50     | ... |    60
12/26/2019  | ABCD          |    51     |    64     | ... |    68
12/27/2019  | ABCG          |    53     |    54     | ... |    50
12/26/2019  | ABCG          |    45     |    47     | ... |    52

I have a stored procedure that looks through the last 10 years and tries to find dates where the hourly patterns were the most similar to a user-defined day. This does this by getting the difference for each hour and adding it to a total. Any days that have a total differential that is within 50 will be returned in the list (so the result set is usually pretty small). It then puts the results in a table so that users can come back and review them later until they run the process again (when they view the data it is sorted showing most similar first, based on that "proximity"). Here is the procedure in its current form:

CREATE OR REPLACE PROCEDURE MYDB.COMPARE_SALES_SP (
    p_compare_date       IN DATE,
    p_location_code      IN VARCHAR2,
    p_userid             IN VARCHAR2,
    p_message            OUT VARCHAR2)
IS
    TYPE SalesArray IS TABLE OF NUMBER
        INDEX BY PLS_INTEGER;

    v_hours_count     INTEGER;
    v_difference      NUMBER;
    v_compare_sales   SalesArray;
    v_curr_sales      SalesArray;
BEGIN
    DELETE FROM MYDB.SALES_ANALYSIS
     WHERE userid = p_userid;
    IF TRUNC (SYSDATE) = TRUNC (p_compare_date)
    THEN
        v_hours_count := MYDB.F_HOUR_ENDING_NUMBER (SYSDATE);
    ELSE
        v_hours_count := 24;
    END IF;

    SELECT HE1_SALES, HE2_SALES, HE3_SALES, HE4_SALES, HE5_SALES, HE6_SALES,
           HE7_SALES, HE8_SALES, HE9_SALES, HE10_SALES, HE11_SALES, HE12_SALES,
           HE13_SALES, HE14_SALES, HE15_SALES, HE16_SALES, HE17_SALES, HE18_SALES,
           HE19_SALES, HE20_SALES, HE21_SALES, HE22_SALES, HE23_SALES, HE24_SALES
      INTO v_compare_sales (1), v_compare_sales (2), v_compare_sales (3), v_compare_sales (4),
           v_compare_sales (5), v_compare_sales (6), v_compare_sales (7), v_compare_sales (8),
           v_compare_sales (9), v_compare_sales (10), v_compare_sales (11), v_compare_sales (12),
           v_compare_sales (13), v_compare_sales (14), v_compare_sales (15), v_compare_sales (16),
           v_compare_sales (17), v_compare_sales (18), v_compare_sales (19), v_compare_sales (20),
           v_compare_sales (21), v_compare_sales (22), v_compare_sales (23), v_compare_sales (24)
      FROM MYDB.SALES_BY_DAY
     WHERE reading_date = TRUNC (p_compare_date) AND location_code = p_location_code;

    FOR i
        IN (SELECT *
              FROM MYDB.SALES_BY_DAY sd
             WHERE sd.READING_DATE > (SYSDATE - 3652)
               AND sd.READING_DATE != TRUNC(p_compare_date)
               AND location_code = p_location_code)
    LOOP
        v_difference := 0;

        SELECT i.HE1_SALES, i.HE2_SALES, i.HE3_SALES, i.HE4_SALES, i.HE5_SALES, i.HE6_SALES,
               i.HE7_SALES, i.HE8_SALES, i.HE9_SALES, i.HE10_SALES, i.HE11_SALES, i.HE12_SALES,
               i.HE13_SALES, i.HE14_SALES, i.HE15_SALES, i.HE16_SALES, i.HE17_SALES, i.HE18_SALES,
               i.HE19_SALES, i.HE20_SALES, i.HE21_SALES, i.HE22_SALES, i.HE23_SALES, i.HE24_SALES
          INTO v_curr_sales (1), v_curr_sales (2), v_curr_sales (3), v_curr_sales (4),
               v_curr_sales (5), v_curr_sales (6), v_curr_sales (7), v_curr_sales (8),
               v_curr_sales (9), v_curr_sales (10), v_curr_sales (11), v_curr_sales (12),
               v_curr_sales (13), v_curr_sales (14), v_curr_sales (15), v_curr_sales (16),
               v_curr_sales (17), v_curr_sales (18), v_curr_sales (19), v_curr_sales (20),
               v_curr_sales (21), v_curr_sales (22), v_curr_sales (23), v_curr_sales (24)
          FROM DUAL;

        FOR j IN 1 .. v_hours_count
        LOOP
            v_difference := v_difference + ABS (v_compare_sales (j) - v_curr_sales (j));
        END LOOP;
        IF (v_difference < 50)
        THEN
            INSERT INTO MYDB.SALES_ANALYSIS (READING_DATE, location_code, USERID, PROXIMITY)
                VALUES (i.READING_DATE, i.location_code, p_userid, v_difference);
        END IF;

    END LOOP;
    COMMIT;
    p_message := 'Sales analysis successful. Please review the results';
EXCEPTION
    WHEN OTHERS
    THEN
        ROLLBACK;
        p_message := 'Sales analysis was not successful. Error: ' || SQLERRM;
END;

The procedure itself is pretty quick (~1 second), however the IDE we're using is suggesting to use BULK COLLECT in the loop for code cleanliness and to ensure that it continues to perform well. I would like to do this, however I'm having trouble wrapping my head around how I should select the row for the compare date and then compare it to all the other rows when using that method. Would BULK COLLECT be the best way to go about this or is there a better way to do this many comparisons?

Edit

The data from this view comes from a table that is structured like this, if it would be easier to select data from the table itself.

SALES_DATE |  HOUR_ENDING  |  LOCATION_CODE |  VALUE
__________________________________________________________
12/27/2019        1              ABCD           40
12/27/2019        2              ABCD           50
12/27/2019        3              ABCD           51

The data must be compared hourly, not on a daily total (observe the image below). In this example, if compared hourly, there is a total difference of 35 (due to the ABS on every hour, since I don't care whether or not the difference is negative or positive...just how close). However if the totals were summed up, this would return a difference of 9.

Comparison of Total vs Hourly

Upvotes: 0

Views: 262

Answers (1)

Belayer
Belayer

Reputation: 14861

--- Revision
With the actual source table (rather than a pivoted view) we can build a much cleaner query. I've left the structure of multiple CTE as this shows how the final result is built. I assume you already have a process in place for joining that view to the requested sales_analysis. After all it consists of s single row per day per location per user. If that is not the case then you need to post another question.

Since you neglected to specify the source table name I created hourly_sales to be that source. Also, you will need extensive testing as you failed to supply a complete set of test date (just 3 he from 1day) and no actual expected results in sales_analysis. In future questions please supply the actual table DDL not just a description. And full test data - an text or better inserts - never images. Anyway the result follows:

create or replace procedure compare_sales_sp(
    p_compare_date       in date,
    p_location_code      in varchar2,
    p_userid             in varchar2,
    p_message            out varchar2)
is
begin
    delete from mybd.sales_analysis
     where userid = p_userid;

     insert into sales_analysis(
                               reading_date  
                              ,location_code  
                              ,userid         
                              ,proximity
                              )
        with hours_to_include as
             ( select case when trunc (sysdate) = trunc (p_compare_date)
                           then 1+to_number(to_char(sysdate, 'hh24'))
                           else 24
                      end num_hours
                 from dual
             )          
           , compare_to as
             ( select hs.* 
                 from hourly_sales hs     
                 join hours_to_include
                   on hour_ending <= num_hours
                where sales_date = p_compare_date
                  and location_code = p_location_code
             ) 
           , daily_sales as
             ( select hs.* 
                 from hourly_sales hs 
                 join hours_to_include
                   on hour_ending <= num_hours
                where location_code = p_location_code
                  and sales_date > add_months(sysdate, -120)
                  and sales_date != p_compare_date       
              )       
         select distinct   
                sales_date
              , location_code
              , p_userid 
              , prox  
           from ( select ds.sales_date
                       , ds.location_code
                       , sum(abs(ds.value - ct.value)) over( partition by ds.sales_date, ds.location_code) prox 
                    from daily_sales ds 
                    join compare_to  ct
                      on (ds.location_code = ct.location_code and 
                          ds.hour_ending   = ct.hour_ending
                         )
                 )
           where prox < 50;                                       
    commit;
    p_message := 'Sales analysis successful. Please review the results';
exception
    when others
    then
        rollback;
        p_message := 'Sales analysis was not successful. Error: ' || sqlerrm;
end compare_sales_sp;

---

In the long run converting from the row-by-row (aka slow-by-slow) to bulk processing could be beneficial. However, bulk processing typically requires more procedural logic. In this case I would recommend against as refactoring to bulk process as it only goes 1/2 way. Why not go all the way - and get rid of procedural code altogether. Let SQL do all the work. Your loops primarily just compute the daily amounts from the 24hourly columns. SQL easily does that calculation. This procedure 1 SQL statement (not counting the delete from prior runs).

create or replace procedure mydb.compare_sales_sp (
    p_compare_date       in date,
    p_location_code      in varchar2,
    p_userid             in varchar2,
    p_message            out varchar2)
is
begin
    delete from mydb.sales_analysis
     where userid = p_userid;

     insert into mydb.sales_analysis(
                               reading_date  
                              ,location_code  
                              ,userid         
                              ,proximity
                              )
        with hours_to_include as
             ( select case when trunc (sysdate) = trunc (p_compare_date)
                           then 1+to_number(to_char(sysdate, 'hh24'))
                           else 24
                      end num_hours
                 from dual
             )          
           , compare_to as
             ( select num_hours, sales_date,location_code
                    , case when num_hours >= 1 then nvl(he1_sales,0) else 0 end
                     +case when num_hours >= 2 then nvl(he2_sales,0) else 0 end
                     +case when num_hours >= 3 then nvl(he3_sales,0) else 0 end
                     +case when num_hours >= 4 then nvl(he4_sales,0) else 0 end
                     +case when num_hours >= 5 then nvl(he5_sales,0) else 0 end 
                     as total_sales
                 from sales_by_day cross join hours_to_include
                where sales_date = p_compare_date
                  and location_code = p_location_code
             ) 
           , daily_sales as
             ( select sales_date,location_code
                    , case when num_hours >= 1 then nvl(he1_sales,0) else 0 end
                     +case when num_hours >= 2 then nvl(he2_sales,0) else 0 end
                     +case when num_hours >= 3 then nvl(he3_sales,0) else 0 end
                     +case when num_hours >= 4 then nvl(he4_sales,0) else 0 end
                     +case when num_hours >= 5 then nvl(he5_sales,0) else 0 end
                     as total_sales
                 from mydb.sales_by_day 
                cross join hours_to_include
                where location_code = p_location_code
                  and sales_date > add_months(sysdate, -120)  -- 120 months = 10 years
             ) 
        select ds.sales_date
             , ds.location_code
             , p_userid
             , abs(ds.total_sales - ct.total_sales) 
          from mydb.daily_sales ds
         cross join compare_to  ct 
         where abs(ct.total_sales -ds.total_sales) < 50;

    commit;
    p_message := 'Sales analysis successful. Please review the results';
exception
    when others
    then
        rollback;
        p_message := 'Sales analysis was not successful. Error: ' || sqlerrm;
end; 

I used only 5 of the necessary 24 columns but enough for for you to see what's needed. The queries aren't vary pretty, but better than procedural code. This is a consequence of not normalizing your data and putting repeating group in a single row.

Upvotes: 3

Related Questions