David Wyly
David Wyly

Reputation: 1701

Regex for properly splitting apart nested SQL delimiters

I am dealing with a string that contains multiple SQL queries:

ALTER TABLE _version ADD test1 INT NOT NULL;
ALTER TABLE _version ADD test2 INT NOT NULL;
CREATE PROCEDURE test3 ()
        LANGUAGE SQL
        DETERMINISTIC
        SQL SECURITY DEFINER
        COMMENT 'A procedure'
        BEGIN
            SELECT 'Hello World !';
        END;

I would like to have this string split into an array of three separate queries, like so:

ALTER TABLE _version ADD test1 INT NOT NULL;

ALTER TABLE _version ADD test5 INT NOT NULL;

CREATE PROCEDURE test3 ()
        LANGUAGE SQL
        DETERMINISTIC
        SQL SECURITY DEFINER
        COMMENT 'A procedure'
        BEGIN
            SELECT 'Hello World !';
        END;

This will allow for me to make multiple queries using PDO. Because unfortunately (to the best of my knowledge) you can't use PDO to make multiple transactional queries from one statement. That, and you can't use the DELIMITER $$ SQL command in a PDO query.

However, notice the ; delimiter between the BEGIN and END. This causes the problem! If I were to simply explode the string using the ; delimiter, it would not separate out the procedure properly.

I've been messing around with regular expressions to find everything NOT between a BEGIN and END, and then replace those delimiters with something else (like a $$ delimiter) -- and then do the explode off of the $$ delimiter -- but nothing seems to be working.

Here's a (bad) stab at what I've tried so far in PHP and regex, where $sqlString is the string with the multiple queries:

$sqlString = preg_replace("#(?<!BEGIN.+);(?!.+END)#",'$$',$sqlString);
$splitQueries = explode("$$",$sqlString);

But I can't get the negative look-aheads/look-behinds working. Please help me figure out either the right regex pattern or the right direction!

Upvotes: 2

Views: 80

Answers (1)

Sam
Sam

Reputation: 20486

Disclaimer: this just does exactly what you asked and splits the input after the ;, unless between BEGIN/END statements (it will fail on something like WHERE column = ';'.

This PHP statement will do exactly what you need it to do (RegEx demo):

$splitQueries = preg_split('/(?<=;)(?!(?:(?!BEGIN).)*END)/s', $sqlString);
array_pop($splitQueries); // there is an extra value in the array, assuming
                          // your query ends in ; 

First, I used a positive lookbehind for the semicolon so that you can just use one simple preg_split() call (since it won't actually be matching the ;, but the space after it).

Next, I followed that by a negative lookahead: (?!.*END). This is where the s modifier was necessary since it makes . match newline characters.

Finally, I replaced the .* in the negative lookahead to with another negative lookahead: (?:(?!BEGIN).)*.

Our final result looks for a semicolon (the space right after it, technically) and then looks at every following character (unless we see BEGIN) to make sure that there is no END (signalling the fact that we're inside a BEGIN/END statement). Fun!

Upvotes: 2

Related Questions