Thomas Tschernich
Thomas Tschernich

Reputation: 1282

Oracle indexes "breaking"

I am working on a data warehousing project, and therefore, I have been implementing some ETL Functions in Packages. I first encountered a problem on my developing laptop and thought it had something to do with my oracle installation, but now it has "spread" over to the production servers. Two functions "sometimes" become incredible slow. We have implemented a logging system, giving us output on a logging table each x rows. When the function usually needs like 10 seconds per chunk, "sometimes" the functions needs up to 3 minutes. After rebuilding some indexes and restarting the function, it is as quick again as it used to be. Unfortunately, I can't tell which index it is exactly, since restarting the function and building up the cursor it uses for its work takes some time and we do not have the time to check each index on its own, so I just rebuild all indexes that are potentially used by the function and restart it.

The functions that have the problem use a cursor to select data from a table with about 50 million to 200 million entries, joined by a small table with about 50-500 entries. The join condition is a string comparison. We then use the primary key from the small table we get from the join to update a foreign key on the main table. The update process is done by a forall loop, this has proven to save loads of time.

Here is a simplified version of the table structure of both tables:

CREATE TABLE "maintable" 
(   "pkmid" NUMBER(11,0) NOT NULL ENABLE, 
"fkid" NUMBER(11,0), 
"fkstring" NVARCHAR2(4) NOT NULL ENABLE, 
 CONSTRAINT "PK_MAINTABLE" PRIMARY KEY ("pkmid");

CREATE TABLE "smalltable" 
(   "pksid" NUMBER(11,0) NOT NULL ENABLE, 
"pkstring" NVARCHAR2(4) NOT NULL ENABLE, 
 CONSTRAINT "PK_SMALLTABLE" PRIMARY KEY ("pksid");

Both tables have indexes on their string columns. Adding the primary keys, I therefore rebuild 4 indexes each time the problem happens.

We get our data in a way, that we only have the fkstring in the maintable available and the fkid is set to null. In a first step, we populate the small table. This only takes minutes and is done the following way:

INSERT INTO smalltable (pksid, pkstring)
    SELECT SEQ_SMALLTABLE.NEXTVAL, fkstring
        FROM 
        (
            SELECT DISTINCT mt.fkstring
                FROM maintable mt                        
            MINUS
            SELECT st.pkstring
                FROM smalltable st
        );
        commit;

This function never causes any trouble.

The following function does (it is a simplified version of the function - I have removed logging and exception handling and renamed some variables):

function f_set_fkid return varchar2 is

    cursor lCursor_MAINTABLE is
        SELECT MT.PKmID, st.pksid
            FROM maintable mt
            JOIN smalltable st ON (mt.fkstring = st.pkstring)
            WHERE mt.fkid IS NULL;
    lIndex number := 0;
    lExitLoop boolean := false;

    type lCursorType is table of lCursor_MAINTABLE%rowtype index by pls_integer;
    lCurrentRow lCursor_MAINTABLE%rowtype;
    lTempDataArray lCursorType;
    lCommitEvery constant number := 1000;

    begin

        open lCursor_MAINTABLE;
            loop

                -- get next row, set exit condition
                fetch lCursor_MAINTABLE into lCurrentRow;
                if (lCursor_MAINTABLE%notfound) then
                    lExitLoop := true;
                end if;

                -- in case of cache being full, flush cache
                if ((lTempDataArray.count > 0) AND (lIndex >= lCommitEvery OR lExitLoop)) then
                    forall lIndex2 in lTempDataArray.FIRST..lTempDataArray.LAST
                        UPDATE maintable mt 
                            set fkid = lTempDataArray(lIndex2).pksid
                            WHERE mt.pkmid = lTempDataArray(lIndex2).pkmid;     
                    commit;
                    lTempDataArray.delete;
                    lIndex := 0;                           
                end if;                                                 

                -- data handling, fill cache
                if (lExitLoop = false) then
                    lIndex := lIndex + 1;                                 
                    lTempDataArray(lIndex). := lCurrentRow;
                end if;

                exit when lExitLoop;

            end loop;

        close lCursor_MAINTABLE; 

        return null;

    end;  

I would be very thankful for any help.

P.S. I do know that bulk collect into would speed up the function and probably also ease up the code a bit too, but at the moment we are content with the speed of the function it usually has. Changing the function to use bulk collect is on our plan for next year, but at the moment it is not an option (and I doubt it would solve this index problem).

Upvotes: 1

Views: 674

Answers (2)

Rob van Laarhoven
Rob van Laarhoven

Reputation: 8905

function f_set_fkid return varchar2 is 

        cursor lCursor_MAINTABLE is 
            SELECT MT.PKmID, st.pksid 
                FROM maintable mt 
                JOIN smalltable st ON (mt.fkstring = st.pkstring) 
                WHERE mt.fkid IS NULL; 
        commit_every INTGER := 1000000;
        commit_counter INTEGER :=0;     
begin 
   for c in lCursor_MAINTABLE 
   loop
      UPDATE maintable mt  
      set fkid = c.pksid 
      WHERE mt.pkmid = c.pkmid;      
      commit_counter := commit_counter+1;
      if mod(commit_every,commit_counter) = 0
      then
         commit;
         commit_counter := 0;
      end if;
   end loop; 
   return null; 
end;   

Upvotes: 0

Klas Lindbäck
Klas Lindbäck

Reputation: 33273

If you have a table where the number of rows fluctuates wildly (as in when doing ETL loads) I would use the statistics of the fully loaded table throughout the load process.

So, generate statistics when your table is fully loaded and then use those statistics for subsequent loads.

If you use statistics from when the table is half-loaded the optimizer may be tricked into not using indexes or not using the fastest index. This is especially true if data is loaded in order so that low value, high value and density are skewed.

In your case, the statistics for columns fkstring and fkid are extra important since those two columns are heavily involved in the procedure that has performance issues.

Upvotes: 2

Related Questions