Reputation: 1368
I need to build a regular expression for C# in order to capture all table names that are after 'from' keywords in select clauses. For example
.
.
.
SELECT field1, field2
FROM table1
WHERE condition1
.
.
.
SELECT field3, field4
FROM table2
WHERE condition2
.
.
.
There are multiple select clauses in the files that I try to read, and there can be any character between select and from(including new lines, ':', '_' and any other character). How should I construct my regex in order to get all table names?
Thanks
EDIT: I' ve found a way to get all table names.
\s*SELECT[^;]*FROM\s*(?<key>[^\n]*)
This might help someone on some day. Thanks
Upvotes: 3
Views: 2474
Reputation: 50241
Unless all queries have only one table and there is no funny business with comments, using regular expressions for this is a losing proposition. Instead:
SET SHOWPLAN_ALL ON;
--All your queries here
See set showplan_all docs.
When I say that the regular expression is going to be complicated, this is what I mean. These are only some of the considerations:
You'll have to detect the start of quoted strings: "
, '
, and [
and ignore all characters inside until terminated properly. Do not terminate if the end character is doubled up (i.e., 'this is ''fun'', he said'
does not stop after is
).
You'll have to exclude single line comments --
that aren't inside of quotes, and terminate them at the next CRLF. Quote marks inside comments do not start strings as usual.
You'll have to exclude multi-line comments (starting with /*
) that aren't inside of quotes or inside of a single line comment, and then skip over everything else except the terminator, */
. In your regex, be sure to escape the *
character with a backslash \
.
You'll have to then find valid FROM
clauses with proper word boundaries (no false match on column names SelfRom
or AfroMonkey
, for example).
To properly terminate the FROM clause, you have to stop capturing when you see any keyword including WHERE
, GROUP BY
, HAVING
, ORDER BY
, and WITH
; and since SQL queries aren't required to have the semicolon terminator ;
then you'll also have to terminate at SELECT
, DBCC
, SET
, CREATE
, ALTER
, DROP
and so on and so forth.
But even the previous two points are not really enough by themselves, because what if your query looks like this:
SELECT *
FROM
MyTable T
INNER JOIN (
SELECT * FROM YourTable Y WHERE Active = 1
) X ON T.ID = Y.ID
INNER JOIN AnotherTable A
ON X.AID = A.AID
Now you have to parse parentheses and NOT stop capturing your FROM clause when you see any of those keywords. And you have to keep track of how many parentheses deep you are and keep ignoring until you're that many out. And finally, what do you do with those, because a derived table is just like a table--do you want the full text of the derived table or just the tables inside that?
To do all that, you can't just start matching at the first valid FROM
in the text, because this could be inside of quotes or comments. You have to match all the text starting at the beginning, because that is the only way with Regex to ensure you don't find a match where you shouldn't.
Here's what I came up with that just tries to handle comments. Not even quotes. And it only goes as far as finding the from clause, not what's inside it. Plus, we have to prevent parentheses from capturing so we don't have an awful mess when examining our capturing groups for the actual FROM clauses.
(?:(?:-(?!-)|/(?!\*)|f(?!rom)|[^-f/])|--[^\n]*\n|/\*(?:\*/)*\*/)*from()
And it's probably rife with errors and I'd have to rethink the whole thing once I played with it a bit and all in all, it would be a gigantic waste of time.
I think you are underestimating how difficult such a thing will be to accomplish well. But there's a perfectly reliable solution! The one I gave above: let SQL Server parse everything for you. You can parse the returned plan easily because it is structured in a way that makes it easy.
Upvotes: 3
Reputation: 1969
var input = "select name from Table1 where id =2";
var pattern = @"from\s*(.*?)\s*where"; // where car= is the first delimiter and ; is the second one
var result = Regex.Match(input, pattern).Groups[1].Value;
MessageBox.Show(result);
Upvotes: 0
Reputation: 879
To start I'd say check out this tutorial: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial
check out a regex like this: ((?<=FROM )[^\s]+)
This regex will start reading directly after "FROM "
(including whitespace) and stops reading at the first whitespace (^\s)
if you try something like this
foreach (Match m in Regex.Matches(input, @"((?<=FROM )[^\s]+)")
{
string output = m.Value;
}
EDIT:
I'm not 100% sure about this regex. If you have a newline character at the end of your name this woudl probably do the job better: @"((?<=FROM)[^\n]+))
but you should probably trim the output as there might be a whitespace at the start of your output string.
Upvotes: 1