Riera
Riera

Reputation: 369

Extract SQL SELECTs queries with regex in C#

Please consider this string:

string mySelects = @"SELECT * FROM TABLE1 WHERE 1=1
SELECT T.* 
FROM (SELECT * FROM TABLE2 WHERE 2=2) T
ORDER BY FOO
SELECT FIELD1, FIELD3
FROM TABLE3
GROUP BY FIELD";

Is there any regular expression to return a MatchCollection with these three Selects in each result?

Thanks in advance!

Upvotes: 0

Views: 689

Answers (2)

Spook
Spook

Reputation: 25927

There is none.

Mathematically speaking, a regular expression describes some set of strings (these, which match this regular expression). Similarly a grammar describes a set of strings (which match this grammar). But grammars are a superset over regular expressions, because every regular expression can be described by a grammar, but unfortunately not the other way around.

SQL is a language, which is described by some grammar. But this grammar is way too complicated to be described by a regular expression. You need some more mathematical firepower to process this language.

The solution is to seek for a ready-made SQL parser or write one using tools like ANTLR.

Upvotes: 1

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726579

There is no regex like that, and it is not possible to write one: the regex programming model is not powerful enough to deal with counting. It is not powerful enough even to balance parentheses in an expression - something that would be required to capture the third select in your example. The fact that .NET's regex engine supports balancing groups is not going to help you much: regex is hopelessly inadequate for the task at hand.

You need something more powerful than the regex - preferably, a complete SQL parser. I have tried this parser (registration is required), it worked fine on many queries. It also has a source in case you wish to adjust it to your needs.

Upvotes: 2

Related Questions