user1597811
user1597811

Reputation: 1073

Stored Procedure taking ages to execute?

DELIMITER $$

CREATE PROCEDURE Load_Fact_List()

BEGIN

  DECLARE Project_Number_Temp INT;
  DECLARE Panel_Id_Temp INT;
  DECLARE Employee_Id_Temp INT;
  DECLARE Zip_Temp VARCHAR(255);
  DECLARE Created_Date_Temp DATE;
  DECLARE Country_Temp VARCHAR(255);


  DECLARE no_more_rows BOOLEAN;
  DECLARE loop_cntr INT DEFAULT 0;
  DECLARE num_rows INT DEFAULT 0;


  DECLARE   load_cur    CURSOR FOR
SELECT  Project_Id, Panel_Id, Employee_Id, Zip, Created_Date
    FROM  Fact_List;



  DECLARE CONTINUE HANDLER FOR NOT FOUND
    SET no_more_rows = TRUE;


  OPEN load_cur;
  select FOUND_ROWS() into num_rows;

  the_loop: LOOP

    FETCH  load_cur
    INTO   Project_Number_Temp, Panel_Id_Temp, Employee_Id_Temp, Zip_Temp, Created_Date_Temp;


    IF no_more_rows THEN
        CLOSE load_cur;
        LEAVE the_loop;
    END IF;

SET Country_Temp= (select Country from Zip where Zip= Zip_Temp);

INSERT INTO Test_Fact
(   
        Project_Key, 
        Campaign_Key, 
        Respondents_Key, 
        Event_Key, 
        Employee_Key, 
        Geography_Key, 
        Date_Key    
)

SELECT (SELECT Project_Key from Project_Dim where Project_Id= Project_Number_Temp AND Quota_Country= Country_Temp),0,(SELECT MAX(Respondents_Key) from Respondents_Dim WHERE Panel_Id= Panel_Id_Temp),1,(select MAX(Employee_Key) from Employee_Dim WHERE Employee_Id= Employee_Id_Temp),(Select Geography_Key from Geography_Dim where Zip= Zip_Temp), (Select Date_Key from Date_Dim where Full_Date= Created_Date_Temp);

    SET loop_cntr = loop_cntr + 1;
  END LOOP the_loop;


  select num_rows, loop_cntr;


END $$

The above code is properly working but it is damn slow. For every 1 hour it is loading 1000 records. I got lacks of records to load into fact table. can anyone suggest me any optimization?

Requirement is to load fact table by looping through other table and gathering required key values from dimension tables.

Upvotes: 0

Views: 148

Answers (1)

fancyPants
fancyPants

Reputation: 51868

The usual procedure is actually like this.

You have your dimensions built and you just gathered the data you want to insert into your fact table in a temporary table. Then you insert this data in another temporary table like this:

INSERT INTO tmp_fact_table
(
fact_key,
dim1_key,
dim2_key,
...
fact1,
fact2
...
)
SELECT 
ISNULL (f.fact_key, 0),
ISNULL (d1.sid, 0) as whatever,
ISNULL (d2.sid, 0) as whatever2,
...
ISNULL (tt.fact1, 0),
ISNULL (tt.fact2, 0)
FROM
yourTempTable tt 
LEFT JOIN Dim1 d1 ON tt.identifying_column = d1.identifying_column
...
LEFT JOIN fact_table f ON 
f.dim1_key = d1.sid
AND f.dim2_key = d2.sid

where

  • fact_key is the identifying column in your fact table
  • dim1_key is the foreign key in your fact table to the dimensions
  • fact1 and so on are the facts you want in your fact table, clear
  • the ISNULL() function returns 0 when no entry is found. 0 is the id of your dummy row in each dimension for unknown data

Then you will have a table where you have the IDs of your dimensions linked to the data you want to import into your fact table with 0 as fact key when the entry in the fact table does not already exist and the ID of the fact table entry otherwise.

Then you update the fact table where tmp_fact_table.fact_key != 0

Then you insert into the fact table where tmp_fact_table.fact_key = 0

That's it.

I'm doing this with millions of rows and it takes about half an hour. 300,000 rows is peanuts.

Upvotes: 1

Related Questions