Reputation: 1073
DELIMITER $$
CREATE PROCEDURE Load_Fact_List()
BEGIN
DECLARE Project_Number_Temp INT;
DECLARE Panel_Id_Temp INT;
DECLARE Employee_Id_Temp INT;
DECLARE Zip_Temp VARCHAR(255);
DECLARE Created_Date_Temp DATE;
DECLARE Country_Temp VARCHAR(255);
DECLARE no_more_rows BOOLEAN;
DECLARE loop_cntr INT DEFAULT 0;
DECLARE num_rows INT DEFAULT 0;
DECLARE load_cur CURSOR FOR
SELECT Project_Id, Panel_Id, Employee_Id, Zip, Created_Date
FROM Fact_List;
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET no_more_rows = TRUE;
OPEN load_cur;
select FOUND_ROWS() into num_rows;
the_loop: LOOP
FETCH load_cur
INTO Project_Number_Temp, Panel_Id_Temp, Employee_Id_Temp, Zip_Temp, Created_Date_Temp;
IF no_more_rows THEN
CLOSE load_cur;
LEAVE the_loop;
END IF;
SET Country_Temp= (select Country from Zip where Zip= Zip_Temp);
INSERT INTO Test_Fact
(
Project_Key,
Campaign_Key,
Respondents_Key,
Event_Key,
Employee_Key,
Geography_Key,
Date_Key
)
SELECT (SELECT Project_Key from Project_Dim where Project_Id= Project_Number_Temp AND Quota_Country= Country_Temp),0,(SELECT MAX(Respondents_Key) from Respondents_Dim WHERE Panel_Id= Panel_Id_Temp),1,(select MAX(Employee_Key) from Employee_Dim WHERE Employee_Id= Employee_Id_Temp),(Select Geography_Key from Geography_Dim where Zip= Zip_Temp), (Select Date_Key from Date_Dim where Full_Date= Created_Date_Temp);
SET loop_cntr = loop_cntr + 1;
END LOOP the_loop;
select num_rows, loop_cntr;
END $$
The above code is properly working but it is damn slow. For every 1 hour it is loading 1000 records. I got lacks of records to load into fact table. can anyone suggest me any optimization?
Requirement is to load fact table by looping through other table and gathering required key values from dimension tables.
Upvotes: 0
Views: 148
Reputation: 51868
The usual procedure is actually like this.
You have your dimensions built and you just gathered the data you want to insert into your fact table in a temporary table. Then you insert this data in another temporary table like this:
INSERT INTO tmp_fact_table
(
fact_key,
dim1_key,
dim2_key,
...
fact1,
fact2
...
)
SELECT
ISNULL (f.fact_key, 0),
ISNULL (d1.sid, 0) as whatever,
ISNULL (d2.sid, 0) as whatever2,
...
ISNULL (tt.fact1, 0),
ISNULL (tt.fact2, 0)
FROM
yourTempTable tt
LEFT JOIN Dim1 d1 ON tt.identifying_column = d1.identifying_column
...
LEFT JOIN fact_table f ON
f.dim1_key = d1.sid
AND f.dim2_key = d2.sid
where
Then you will have a table where you have the IDs of your dimensions linked to the data you want to import into your fact table with 0 as fact key when the entry in the fact table does not already exist and the ID of the fact table entry otherwise.
Then you update the fact table where tmp_fact_table.fact_key != 0
Then you insert into the fact table where tmp_fact_table.fact_key = 0
That's it.
I'm doing this with millions of rows and it takes about half an hour. 300,000 rows is peanuts.
Upvotes: 1