Reputation: 1276
I have a database view that tracks hourly data in an Oracle 12c database. The data is arranged like this, with columns for each hour of the day.
SALES_DATE | LOCATION_CODE | HE1_SALES | HE2_SALES | ... | HE24_SALES
_________________________________________________________________________
12/27/2019 | ABCD | 40 | 50 | ... | 60
12/26/2019 | ABCD | 51 | 64 | ... | 68
12/27/2019 | ABCG | 53 | 54 | ... | 50
12/26/2019 | ABCG | 45 | 47 | ... | 52
I have a stored procedure that looks through the last 10 years and tries to find dates where the hourly patterns were the most similar to a user-defined day. This does this by getting the difference for each hour and adding it to a total. Any days that have a total differential that is within 50 will be returned in the list (so the result set is usually pretty small). It then puts the results in a table so that users can come back and review them later until they run the process again (when they view the data it is sorted showing most similar first, based on that "proximity"). Here is the procedure in its current form:
CREATE OR REPLACE PROCEDURE MYDB.COMPARE_SALES_SP (
p_compare_date IN DATE,
p_location_code IN VARCHAR2,
p_userid IN VARCHAR2,
p_message OUT VARCHAR2)
IS
TYPE SalesArray IS TABLE OF NUMBER
INDEX BY PLS_INTEGER;
v_hours_count INTEGER;
v_difference NUMBER;
v_compare_sales SalesArray;
v_curr_sales SalesArray;
BEGIN
DELETE FROM MYDB.SALES_ANALYSIS
WHERE userid = p_userid;
IF TRUNC (SYSDATE) = TRUNC (p_compare_date)
THEN
v_hours_count := MYDB.F_HOUR_ENDING_NUMBER (SYSDATE);
ELSE
v_hours_count := 24;
END IF;
SELECT HE1_SALES, HE2_SALES, HE3_SALES, HE4_SALES, HE5_SALES, HE6_SALES,
HE7_SALES, HE8_SALES, HE9_SALES, HE10_SALES, HE11_SALES, HE12_SALES,
HE13_SALES, HE14_SALES, HE15_SALES, HE16_SALES, HE17_SALES, HE18_SALES,
HE19_SALES, HE20_SALES, HE21_SALES, HE22_SALES, HE23_SALES, HE24_SALES
INTO v_compare_sales (1), v_compare_sales (2), v_compare_sales (3), v_compare_sales (4),
v_compare_sales (5), v_compare_sales (6), v_compare_sales (7), v_compare_sales (8),
v_compare_sales (9), v_compare_sales (10), v_compare_sales (11), v_compare_sales (12),
v_compare_sales (13), v_compare_sales (14), v_compare_sales (15), v_compare_sales (16),
v_compare_sales (17), v_compare_sales (18), v_compare_sales (19), v_compare_sales (20),
v_compare_sales (21), v_compare_sales (22), v_compare_sales (23), v_compare_sales (24)
FROM MYDB.SALES_BY_DAY
WHERE reading_date = TRUNC (p_compare_date) AND location_code = p_location_code;
FOR i
IN (SELECT *
FROM MYDB.SALES_BY_DAY sd
WHERE sd.READING_DATE > (SYSDATE - 3652)
AND sd.READING_DATE != TRUNC(p_compare_date)
AND location_code = p_location_code)
LOOP
v_difference := 0;
SELECT i.HE1_SALES, i.HE2_SALES, i.HE3_SALES, i.HE4_SALES, i.HE5_SALES, i.HE6_SALES,
i.HE7_SALES, i.HE8_SALES, i.HE9_SALES, i.HE10_SALES, i.HE11_SALES, i.HE12_SALES,
i.HE13_SALES, i.HE14_SALES, i.HE15_SALES, i.HE16_SALES, i.HE17_SALES, i.HE18_SALES,
i.HE19_SALES, i.HE20_SALES, i.HE21_SALES, i.HE22_SALES, i.HE23_SALES, i.HE24_SALES
INTO v_curr_sales (1), v_curr_sales (2), v_curr_sales (3), v_curr_sales (4),
v_curr_sales (5), v_curr_sales (6), v_curr_sales (7), v_curr_sales (8),
v_curr_sales (9), v_curr_sales (10), v_curr_sales (11), v_curr_sales (12),
v_curr_sales (13), v_curr_sales (14), v_curr_sales (15), v_curr_sales (16),
v_curr_sales (17), v_curr_sales (18), v_curr_sales (19), v_curr_sales (20),
v_curr_sales (21), v_curr_sales (22), v_curr_sales (23), v_curr_sales (24)
FROM DUAL;
FOR j IN 1 .. v_hours_count
LOOP
v_difference := v_difference + ABS (v_compare_sales (j) - v_curr_sales (j));
END LOOP;
IF (v_difference < 50)
THEN
INSERT INTO MYDB.SALES_ANALYSIS (READING_DATE, location_code, USERID, PROXIMITY)
VALUES (i.READING_DATE, i.location_code, p_userid, v_difference);
END IF;
END LOOP;
COMMIT;
p_message := 'Sales analysis successful. Please review the results';
EXCEPTION
WHEN OTHERS
THEN
ROLLBACK;
p_message := 'Sales analysis was not successful. Error: ' || SQLERRM;
END;
The procedure itself is pretty quick (~1 second), however the IDE we're using is suggesting to use BULK COLLECT in the loop for code cleanliness and to ensure that it continues to perform well. I would like to do this, however I'm having trouble wrapping my head around how I should select the row for the compare date and then compare it to all the other rows when using that method. Would BULK COLLECT be the best way to go about this or is there a better way to do this many comparisons?
Edit
The data from this view comes from a table that is structured like this, if it would be easier to select data from the table itself.
SALES_DATE | HOUR_ENDING | LOCATION_CODE | VALUE
__________________________________________________________
12/27/2019 1 ABCD 40
12/27/2019 2 ABCD 50
12/27/2019 3 ABCD 51
The data must be compared hourly, not on a daily total (observe the image below). In this example, if compared hourly, there is a total difference of 35 (due to the ABS on every hour, since I don't care whether or not the difference is negative or positive...just how close). However if the totals were summed up, this would return a difference of 9.
Upvotes: 0
Views: 262
Reputation: 14861
--- Revision
With the actual source table (rather than a pivoted view) we can build a much cleaner query. I've left the structure of multiple CTE as
this shows how the final result is built. I assume you already have a process in place for joining that view to the requested sales_analysis.
After all it consists of s single row per day per location per user. If that is not the case then you need to post another question.
Since you neglected to specify the source table name I created hourly_sales to be that source. Also, you will need extensive testing as you failed to supply a complete set of test date (just 3 he from 1day) and no actual expected results in sales_analysis. In future questions please supply the actual table DDL not just a description. And full test data - an text or better inserts - never images. Anyway the result follows:
create or replace procedure compare_sales_sp(
p_compare_date in date,
p_location_code in varchar2,
p_userid in varchar2,
p_message out varchar2)
is
begin
delete from mybd.sales_analysis
where userid = p_userid;
insert into sales_analysis(
reading_date
,location_code
,userid
,proximity
)
with hours_to_include as
( select case when trunc (sysdate) = trunc (p_compare_date)
then 1+to_number(to_char(sysdate, 'hh24'))
else 24
end num_hours
from dual
)
, compare_to as
( select hs.*
from hourly_sales hs
join hours_to_include
on hour_ending <= num_hours
where sales_date = p_compare_date
and location_code = p_location_code
)
, daily_sales as
( select hs.*
from hourly_sales hs
join hours_to_include
on hour_ending <= num_hours
where location_code = p_location_code
and sales_date > add_months(sysdate, -120)
and sales_date != p_compare_date
)
select distinct
sales_date
, location_code
, p_userid
, prox
from ( select ds.sales_date
, ds.location_code
, sum(abs(ds.value - ct.value)) over( partition by ds.sales_date, ds.location_code) prox
from daily_sales ds
join compare_to ct
on (ds.location_code = ct.location_code and
ds.hour_ending = ct.hour_ending
)
)
where prox < 50;
commit;
p_message := 'Sales analysis successful. Please review the results';
exception
when others
then
rollback;
p_message := 'Sales analysis was not successful. Error: ' || sqlerrm;
end compare_sales_sp;
---
In the long run converting from the row-by-row (aka slow-by-slow) to bulk processing could be beneficial. However, bulk processing typically requires more procedural logic. In this case I would recommend against as refactoring to bulk process as it only goes 1/2 way. Why not go all the way - and get rid of procedural code altogether. Let SQL do all the work. Your loops primarily just compute the daily amounts from the 24hourly columns. SQL easily does that calculation. This procedure 1 SQL statement (not counting the delete from prior runs).
create or replace procedure mydb.compare_sales_sp (
p_compare_date in date,
p_location_code in varchar2,
p_userid in varchar2,
p_message out varchar2)
is
begin
delete from mydb.sales_analysis
where userid = p_userid;
insert into mydb.sales_analysis(
reading_date
,location_code
,userid
,proximity
)
with hours_to_include as
( select case when trunc (sysdate) = trunc (p_compare_date)
then 1+to_number(to_char(sysdate, 'hh24'))
else 24
end num_hours
from dual
)
, compare_to as
( select num_hours, sales_date,location_code
, case when num_hours >= 1 then nvl(he1_sales,0) else 0 end
+case when num_hours >= 2 then nvl(he2_sales,0) else 0 end
+case when num_hours >= 3 then nvl(he3_sales,0) else 0 end
+case when num_hours >= 4 then nvl(he4_sales,0) else 0 end
+case when num_hours >= 5 then nvl(he5_sales,0) else 0 end
as total_sales
from sales_by_day cross join hours_to_include
where sales_date = p_compare_date
and location_code = p_location_code
)
, daily_sales as
( select sales_date,location_code
, case when num_hours >= 1 then nvl(he1_sales,0) else 0 end
+case when num_hours >= 2 then nvl(he2_sales,0) else 0 end
+case when num_hours >= 3 then nvl(he3_sales,0) else 0 end
+case when num_hours >= 4 then nvl(he4_sales,0) else 0 end
+case when num_hours >= 5 then nvl(he5_sales,0) else 0 end
as total_sales
from mydb.sales_by_day
cross join hours_to_include
where location_code = p_location_code
and sales_date > add_months(sysdate, -120) -- 120 months = 10 years
)
select ds.sales_date
, ds.location_code
, p_userid
, abs(ds.total_sales - ct.total_sales)
from mydb.daily_sales ds
cross join compare_to ct
where abs(ct.total_sales -ds.total_sales) < 50;
commit;
p_message := 'Sales analysis successful. Please review the results';
exception
when others
then
rollback;
p_message := 'Sales analysis was not successful. Error: ' || sqlerrm;
end;
I used only 5 of the necessary 24 columns but enough for for you to see what's needed. The queries aren't vary pretty, but better than procedural code. This is a consequence of not normalizing your data and putting repeating group in a single row.
Upvotes: 3