Mikhail T.
Mikhail T.

Reputation: 4017

Need help splitting text in Java using a Pattern with look-ahead

The SQL texts, that our program is expected to process may have special comments embedded in them, which require special handling by the program -- rather than by the remote server.

I'm trying to split the input SQL into chunks consisting of either those special comments and anything in between. I almost got it working with this hairy pattern:

private static final Pattern SQLSplitter = Pattern.compile(
    "[\\s;]*\\s*(?=((" +
    "-- INPUT_FILE_NAME:" + '|' +
    "-- OUTPUT_FILE_NAME:" + ").+\\R|" +
    "((CREATE|ALTER)\\s+PROCEDURE)))[\\s;]*|" +
    "(?<=END)[\\s]*;[;\\s]+",
    Pattern.MULTILINE|Pattern.CASE_INSENSITIVE);
...
for (String part : SQLSplitter.split(sql)) {
        part = part.trim();
        if (part.isEmpty() || part.equals("!"))
            continue;
...
}

However, for an input blob like:

-- OUTPUT_FILE_NAME: /tmp/7475993877025492664.txt
;

;ALTER PROCEDURE #MEOW_sp
AS
BEGIN
SELECT BAR, FOO FROM #MEOW;
END

;
-- INPUT_FILE_NAME: /dev/null! #MEOW! false! ,!


exec #MeOw_sp;

I get these three parts:

1.

-- OUTPUT_FILE_NAME: /tmp/7475993877025492664.txt
ALTER PROCEDURE #MEOW_sp
AS
BEGIN
SELECT BAR, FOO FROM #MEOW;
END
-- INPUT_FILE_NAME: /dev/null! #MEOW! false! ,!


exec #MeOw_sp;

Note, how the last line remains "glued" to the special comment before it. How do I fix the splitting regexp so that the same input is split into four parts instead, the first two being the same as before, but:

3.

-- INPUT_FILE_NAME: /dev/null! #MEOW! false! ,!
exec #MeOw_sp;

I thought, by adding the [^\\n]+\\n+, I'll make sure, the special comment-parts will end at new line, but they don't...

Upvotes: 0

Views: 62

Answers (0)

Related Questions