Reputation: 1282
I am working on a data warehousing project, and therefore, I have been implementing some ETL Functions in Packages. I first encountered a problem on my developing laptop and thought it had something to do with my oracle installation, but now it has "spread" over to the production servers. Two functions "sometimes" become incredible slow. We have implemented a logging system, giving us output on a logging table each x rows. When the function usually needs like 10 seconds per chunk, "sometimes" the functions needs up to 3 minutes. After rebuilding some indexes and restarting the function, it is as quick again as it used to be. Unfortunately, I can't tell which index it is exactly, since restarting the function and building up the cursor it uses for its work takes some time and we do not have the time to check each index on its own, so I just rebuild all indexes that are potentially used by the function and restart it.
The functions that have the problem use a cursor to select data from a table with about 50 million to 200 million entries, joined by a small table with about 50-500 entries. The join condition is a string comparison. We then use the primary key from the small table we get from the join to update a foreign key on the main table. The update process is done by a forall loop, this has proven to save loads of time.
Here is a simplified version of the table structure of both tables:
CREATE TABLE "maintable"
( "pkmid" NUMBER(11,0) NOT NULL ENABLE,
"fkid" NUMBER(11,0),
"fkstring" NVARCHAR2(4) NOT NULL ENABLE,
CONSTRAINT "PK_MAINTABLE" PRIMARY KEY ("pkmid");
CREATE TABLE "smalltable"
( "pksid" NUMBER(11,0) NOT NULL ENABLE,
"pkstring" NVARCHAR2(4) NOT NULL ENABLE,
CONSTRAINT "PK_SMALLTABLE" PRIMARY KEY ("pksid");
Both tables have indexes on their string columns. Adding the primary keys, I therefore rebuild 4 indexes each time the problem happens.
We get our data in a way, that we only have the fkstring in the maintable available and the fkid is set to null. In a first step, we populate the small table. This only takes minutes and is done the following way:
INSERT INTO smalltable (pksid, pkstring)
SELECT SEQ_SMALLTABLE.NEXTVAL, fkstring
FROM
(
SELECT DISTINCT mt.fkstring
FROM maintable mt
MINUS
SELECT st.pkstring
FROM smalltable st
);
commit;
This function never causes any trouble.
The following function does (it is a simplified version of the function - I have removed logging and exception handling and renamed some variables):
function f_set_fkid return varchar2 is
cursor lCursor_MAINTABLE is
SELECT MT.PKmID, st.pksid
FROM maintable mt
JOIN smalltable st ON (mt.fkstring = st.pkstring)
WHERE mt.fkid IS NULL;
lIndex number := 0;
lExitLoop boolean := false;
type lCursorType is table of lCursor_MAINTABLE%rowtype index by pls_integer;
lCurrentRow lCursor_MAINTABLE%rowtype;
lTempDataArray lCursorType;
lCommitEvery constant number := 1000;
begin
open lCursor_MAINTABLE;
loop
-- get next row, set exit condition
fetch lCursor_MAINTABLE into lCurrentRow;
if (lCursor_MAINTABLE%notfound) then
lExitLoop := true;
end if;
-- in case of cache being full, flush cache
if ((lTempDataArray.count > 0) AND (lIndex >= lCommitEvery OR lExitLoop)) then
forall lIndex2 in lTempDataArray.FIRST..lTempDataArray.LAST
UPDATE maintable mt
set fkid = lTempDataArray(lIndex2).pksid
WHERE mt.pkmid = lTempDataArray(lIndex2).pkmid;
commit;
lTempDataArray.delete;
lIndex := 0;
end if;
-- data handling, fill cache
if (lExitLoop = false) then
lIndex := lIndex + 1;
lTempDataArray(lIndex). := lCurrentRow;
end if;
exit when lExitLoop;
end loop;
close lCursor_MAINTABLE;
return null;
end;
I would be very thankful for any help.
P.S. I do know that bulk collect into would speed up the function and probably also ease up the code a bit too, but at the moment we are content with the speed of the function it usually has. Changing the function to use bulk collect is on our plan for next year, but at the moment it is not an option (and I doubt it would solve this index problem).
Upvotes: 1
Views: 674
Reputation: 8905
function f_set_fkid return varchar2 is
cursor lCursor_MAINTABLE is
SELECT MT.PKmID, st.pksid
FROM maintable mt
JOIN smalltable st ON (mt.fkstring = st.pkstring)
WHERE mt.fkid IS NULL;
commit_every INTGER := 1000000;
commit_counter INTEGER :=0;
begin
for c in lCursor_MAINTABLE
loop
UPDATE maintable mt
set fkid = c.pksid
WHERE mt.pkmid = c.pkmid;
commit_counter := commit_counter+1;
if mod(commit_every,commit_counter) = 0
then
commit;
commit_counter := 0;
end if;
end loop;
return null;
end;
Upvotes: 0
Reputation: 33273
If you have a table where the number of rows fluctuates wildly (as in when doing ETL loads) I would use the statistics of the fully loaded table throughout the load process.
So, generate statistics when your table is fully loaded and then use those statistics for subsequent loads.
If you use statistics from when the table is half-loaded the optimizer may be tricked into not using indexes or not using the fastest index. This is especially true if data is loaded in order so that low value, high value and density are skewed.
In your case, the statistics for columns fkstring
and fkid
are extra important since those two columns are heavily involved in the procedure that has performance issues.
Upvotes: 2