Reputation: 2888
What I'm trying to do is write a stored procedure that will query a view, process each row and and make one or more inserts into a table for each row pulled from the view. Everything seems fine, except for the fact that, arbitrarily, mid-point during the process, the server seems to hang on the insert command. I have no idea if there's some memory limit on cursor results sets, or what could be happening. Relevant parts of the SP and a few clarifying comments posted below.
CREATE PROCEDURE `Cache_Network_Observations` ()
BEGIN
-- Declare all variables
/* This cursor is hitting the view which should be returning a number of rows on the scale of ~5M+ records
*/
DECLARE cursor1 CURSOR FOR
SELECT * FROM usanpn2.vw_Network_Observation;
CREATE TABLE Cached_Network_Observation_Temp (observation_id int, name varchar(100), id int);
OPEN cursor1;
load_loop: loop
FETCH cursor1 INTO observation_id, id1, name1, id2, name2, id3, name3, gid1, gname1, gid2, gname2, gid3, gname3;
IF id1 IS NOT NULL THEN
INSERT INTO usanpn2.Cached_Network_Observation_Temp values (observation_id, name1, id1);
END IF;
-- some additional logic here, essentially just the same as the above if statement
END LOOP;
CLOSE cursor1;
END
That being the SP, when I actually run it, everything goes off without a hitch until the process runs and runs and runs. Taking a look at the active query report, I am seeing this:
| 1076 | root | localhost | mydb | Query | 3253 | update | INSERT INTO usanpn2.Cached_Network_Observation values ( NAME_CONST('observation_id',2137912), NAME_ |
Not positive where the NAME_CONST function is coming from or what that has to do with anything. I've tried this multiple times, the observation_id variable / row in the view varies each time, so it doesn't seem to be anything tied to the record.
TIA!
Upvotes: 2
Views: 1147
Reputation: 108370
I don't see a NOT FOUND
handler for your fetch loop. There's no "exit" condition.
DECLARE done INT DEFAULT FALSE;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
Immediately following the fetch
, test the done
flag, and exit the loop when it's true.
IF done THEN
LEAVE load_loop;
END IF;
Without that, I think you have yourself a classic infinite loop.
The statement shown in the SHOW FULL PROCESSLIST
output is inserting to a different table. (There's no _Temp
at the end of the tablename.)
But why on earth do you need a cursor loop to process this row-by-agonizing-row?
If you need a table loaded, just load the flipping table, and be done with it.
Replace all of that "declare cursor", "open cursor", fetch loop, exit handler, individual insert statement nonsense with a single statement that does what you need done:
INSERT INTO Cached_Network_Observation_Temp (observation_id, `name`, id)
SELECT s.observation_id, s.name1 AS `name`, s.id1 AS id
FROM usanpn2.vw_Network_Observation s
WHERE s.id1 IS NOT NULL
That is going to be way more efficient. And it won't clog up the binary logs with a bloatload of unnecessary INSERT statements. (This also has me wanting to backup to a bigger picture, and understand why this table is even needed. This also has me wondering if vw_Network_Observation
is a view, and whether the overhead of materializing a derived table is warranted. The predicate in that outer query isn't getting pushed down into the view definition. MySQL processes views much differently than other RDBMSs do.)
EDIT
If the next part of the procedure that is commented out is checking whether id2
is not null to conditionally insert id2
,name2
to the _Temp table, that can be done in the same way.
Or, the multiple queries can be combined with UNION ALL
operator.
INSERT INTO Cached_Network_Observation_Temp (observation_id, `name`, id)
SELECT s1.observation_id, s1.name1 AS `name`, s1.id1 AS id
FROM usanpn2.vw_Network_Observation s1
WHERE s1.id1 IS NOT NULL
UNION ALL
SELECT s2.observation_id, s2.name2 AS `name`, s2.id2 AS id
FROM usanpn2.vw_Network_Observation s2
WHERE s2.id2 IS NOT NULL
... etc.
FOLLOWUP
If we need to generate multiple rows out a single row, and the number of rows isn't unreasonably large, I'd be tempted to test something like this, processing id1
, id2
, id3
and id4
in one fell swoop, using a CROSS JOIN
of the row source (s
) and a artificially generated set of four rows.
That would generate four rows per row from the row source (s
), and we can use conditional expressions to return id1
, id2
, etc.
As an example, something like this:
SELECT s.observation_id
, CASE n.i
WHEN 1 THEN s.id1
WHEN 2 THEN s.id2
WHEN 3 THEN s.id3
WHEN 4 THEN s.id4
END AS `id`
, CASE n.i
WHEN 1 THEN s.name1
WHEN 2 THEN s.name2
WHEN 3 THEN s.name3
WHEN 4 THEN s.name4
END AS `name`
FROM usanpn2.vw_Network_Observation s
CROSS
JOIN ( SELECT 1 AS i UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) n
HAVING `id` IS NOT NULL
We use a predicate in the HAVING
clause rather than the WHERE
clause because the value generated for the id
column in the resultset isn't available when the rows are accessed. The predicates in the HAVING
clause are applied nearly last in the execution plan, after the rows are accessed, just before the rows are returned. (I think a "filesort" operation to satisfy an ORDER BY
, and the LIMIT
clause are applied after the HAVING
.)
If the number of rows to be processed is "very large", then we may get better performance processing rows in several reasonably sized batches. If we do a batch size of two, processing two rows per INSERT, that effectively halves the number of INSERTs we need to run. With 4 rows per batch, we cut that in half again. Once we are up to a couple of dozen rows per batch, we've significantly reduced the number of individual INSERT statements we need to run.
As the batches get progressively larger, our performance gains become much smaller. Until the batches become unwieldy ("too large") and we start thrashing to disk. There's a performance "sweet spot" in there between the two extremes (processing one row at a time vs processing ALL of the rows in one batch).
Upvotes: 2