user470714
user470714

Reputation: 2888

MySQL Stored Procedure Insert Hanging

What I'm trying to do is write a stored procedure that will query a view, process each row and and make one or more inserts into a table for each row pulled from the view. Everything seems fine, except for the fact that, arbitrarily, mid-point during the process, the server seems to hang on the insert command. I have no idea if there's some memory limit on cursor results sets, or what could be happening. Relevant parts of the SP and a few clarifying comments posted below.

CREATE PROCEDURE `Cache_Network_Observations` ()
BEGIN

-- Declare all variables

/* This cursor is hitting the view which should be returning a number of rows on the scale of ~5M+ records
*/
DECLARE cursor1 CURSOR FOR
SELECT * FROM usanpn2.vw_Network_Observation;

CREATE TABLE Cached_Network_Observation_Temp (observation_id int, name varchar(100), id int);

OPEN cursor1;

load_loop: loop

FETCH cursor1 INTO observation_id, id1, name1, id2, name2, id3, name3, gid1, gname1, gid2, gname2, gid3, gname3;

    IF id1 IS NOT NULL THEN
        INSERT INTO usanpn2.Cached_Network_Observation_Temp values (observation_id, name1, id1);
    END IF;   

    -- some additional logic here, essentially just the same as the above if statement 

END LOOP;
CLOSE cursor1;

END

That being the SP, when I actually run it, everything goes off without a hitch until the process runs and runs and runs. Taking a look at the active query report, I am seeing this:

| 1076 | root    | localhost                              | mydb | Query   | 3253 | update | INSERT INTO usanpn2.Cached_Network_Observation values ( NAME_CONST('observation_id',2137912),  NAME_ |

Not positive where the NAME_CONST function is coming from or what that has to do with anything. I've tried this multiple times, the observation_id variable / row in the view varies each time, so it doesn't seem to be anything tied to the record.

TIA!

Upvotes: 2

Views: 1147

Answers (1)

spencer7593
spencer7593

Reputation: 108370

I don't see a NOT FOUND handler for your fetch loop. There's no "exit" condition.

DECLARE done INT DEFAULT FALSE;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

Immediately following the fetch, test the done flag, and exit the loop when it's true.

IF done THEN
  LEAVE load_loop;
END IF;

Without that, I think you have yourself a classic infinite loop.


The statement shown in the SHOW FULL PROCESSLIST output is inserting to a different table. (There's no _Temp at the end of the tablename.)


But why on earth do you need a cursor loop to process this row-by-agonizing-row?

If you need a table loaded, just load the flipping table, and be done with it.

Replace all of that "declare cursor", "open cursor", fetch loop, exit handler, individual insert statement nonsense with a single statement that does what you need done:

INSERT INTO Cached_Network_Observation_Temp (observation_id, `name`, id)
SELECT s.observation_id, s.name1 AS `name`, s.id1 AS id
  FROM usanpn2.vw_Network_Observation s
 WHERE s.id1 IS NOT NULL

That is going to be way more efficient. And it won't clog up the binary logs with a bloatload of unnecessary INSERT statements. (This also has me wanting to backup to a bigger picture, and understand why this table is even needed. This also has me wondering if vw_Network_Observation is a view, and whether the overhead of materializing a derived table is warranted. The predicate in that outer query isn't getting pushed down into the view definition. MySQL processes views much differently than other RDBMSs do.)

EDIT

If the next part of the procedure that is commented out is checking whether id2 is not null to conditionally insert id2,name2 to the _Temp table, that can be done in the same way.

Or, the multiple queries can be combined with UNION ALL operator.

INSERT INTO Cached_Network_Observation_Temp (observation_id, `name`, id)

SELECT s1.observation_id, s1.name1 AS `name`, s1.id1 AS id
  FROM usanpn2.vw_Network_Observation s1
 WHERE s1.id1 IS NOT NULL

 UNION ALL

SELECT s2.observation_id, s2.name2 AS `name`, s2.id2 AS id
  FROM usanpn2.vw_Network_Observation s2
 WHERE s2.id2 IS NOT NULL

... etc.

FOLLOWUP

If we need to generate multiple rows out a single row, and the number of rows isn't unreasonably large, I'd be tempted to test something like this, processing id1, id2, id3 and id4 in one fell swoop, using a CROSS JOIN of the row source (s) and a artificially generated set of four rows.

That would generate four rows per row from the row source (s), and we can use conditional expressions to return id1, id2, etc.

As an example, something like this:

SELECT s.observation_id
     , CASE n.i
         WHEN 1 THEN s.id1
         WHEN 2 THEN s.id2
         WHEN 3 THEN s.id3
         WHEN 4 THEN s.id4
       END AS `id`
     , CASE n.i
         WHEN 1 THEN s.name1
         WHEN 2 THEN s.name2
         WHEN 3 THEN s.name3
         WHEN 4 THEN s.name4
       END AS `name`
  FROM usanpn2.vw_Network_Observation s
 CROSS
  JOIN ( SELECT 1 AS i UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) n
HAVING `id` IS NOT NULL

We use a predicate in the HAVING clause rather than the WHERE clause because the value generated for the id column in the resultset isn't available when the rows are accessed. The predicates in the HAVING clause are applied nearly last in the execution plan, after the rows are accessed, just before the rows are returned. (I think a "filesort" operation to satisfy an ORDER BY, and the LIMIT clause are applied after the HAVING.)

If the number of rows to be processed is "very large", then we may get better performance processing rows in several reasonably sized batches. If we do a batch size of two, processing two rows per INSERT, that effectively halves the number of INSERTs we need to run. With 4 rows per batch, we cut that in half again. Once we are up to a couple of dozen rows per batch, we've significantly reduced the number of individual INSERT statements we need to run.

As the batches get progressively larger, our performance gains become much smaller. Until the batches become unwieldy ("too large") and we start thrashing to disk. There's a performance "sweet spot" in there between the two extremes (processing one row at a time vs processing ALL of the rows in one batch).

Upvotes: 2

Related Questions