Reputation: 123
I have searched high and low and am really struggling to find the appropriate REGEX that would help me retrieve what I want. Assume I have the following query string:
SELECT col, col2 AS c2, col3, col * col2 calc FROM...
I want a REGEX that will pull out everything between commas where there is a space or an "AS" that proceeds a column alias. It should also exclude the SELECT and FROM. With the example above, I would want the following matches:
col2 AS c2
col * col2 calc
Essentially, this should remove col and col3 because they don't have aliases. I have tried variations of the following but have been unsuccessful (Note: $colname contains the alias):
/(SELECT|,)([^,]*{$colname}[^,]*)(FROM|,)/
Currently this only returns:
, col2 AS c2,
col2 AS c2
, col2 AS c2,
col2 AS c2
I feel I am close but I need some guidance. Any suggestions are appreciated.
Upvotes: 2
Views: 1867
Reputation: 626929
It is next to impossible to handle SQL string parsing with one regex, you should look for a more generic solution like PHP-SQL-Parser.
The regex I have come up with is an ad-hoc work around that can be further precised, adjusted, etc. to fit the needs of a specific project:
(?:\G(?!\A),|SELECT)(?:\s+[^\s,]+(?:\([^()]*\))?,)*\s*\K[^\s,]+\s*(?:(?![()',])[[:punct:]]|AS)\s*(?:(?!FROM)[^,])*(?!\w)
See the regex demo.
Here is what it does:
(?:\G(?!\A),|SELECT)
- the end of the previous successful match and a comma after (\G(?!\A),
) or (|
) a literal SELECT
substring(?:\s+[^\s,]+(?:\([^()]*\))?,)*
- zero or more sequences ((?:...)*
) of:
\s+
- 1+ whitespaces[^\s,]+
- 1+ chars other than whitespace and a comma(?:\([^()]*\))?
- an optional substring like (...)
(a (
, then 0+ chars other than (
and )
, and then a )
),
- a comma\s*
- 0+ whitespace chars\K
- a match reset operator discarding the text matched so far (so, SELECT abc, Contains(...),
will get omitted from the match, but will be required for the match)[^\s,]+
- 1 or more chars other than whitespace and ,
\s*
- 0+ whitespace chars(?:(?![()',])[[:punct:]]|AS)
- either
(?![()',])[[:punct:]]
- any punctuation or symbol char but a (
, )
, '
and ,
|
- orAS
- a literal AS
substring\s*
- 0+ whitespaces(?:(?!FROM)[^,])*
- zero or more chars other than a comma not starting a FROM
char sequence (it is a tempered greedy token)(?!\w)
- not followed with a word char.Upvotes: 2