Reputation: 484
I am trying to split a SQL Query script, and on a create table, I only want the contents inside the outermost parenthesis, while ignoring any parenthesis inside, which are mainly describing maximum characters for a column type.
Here is a sample of my query:
CREATE TABLE IF NOT EXISTS %SCHEMA%.business (
id UUID NOT NULL,
name VARCHAR(50) NOT NULL,
DBA VARCHAR(50),
isactive BOOL NOT NULL DEFAULT TRUE,
isdeleted BOOL NOT NULL DEFAULT FALSE,
createdon TIMESTAMP(6) WITHOUT TIME ZONE NOT NULL
)
WITH (OIDS = FALSE)
I want to get back everything inside that describe the columns. I have tried these
\(([^\)]*)\)
stops at the first (50), and reads the other (50) and (6), but not anything else, and doesn't reach the end of that outside parenthesis.
\((\d+)\)
This will only read the inside parenthesis, which could be useful later, but I only get back the two (50)'s and the (6).
What regular expression will capture all of the inside of that parenthesis? Even if it does capture more, I can always remove the excess strings through code later.
Upvotes: 3
Views: 355
Reputation: 22817
.NET regular expressions have what's called Balancing Groups. This ensures that a starting delimiter is followed by a closing delimiter (both of your choice). If these do not balance it stops matching at that point in the string.
This is how you'd accomplish this using regex, but other alternatives are less breakable and may be better suited for the task.
(?<=\()(?:[^()]|(?<B>\()|(?<-B>\)))+(?(B)(?!))(?=\))
(?<=\()
Positive lookbehind ensuring what precedes is (
(?:[^()]|(?<B>\()|(?<-B>\)))+(?(B)(?!))
The magic. Simplistically put, this matches either any character that is not (
or )
, or it matches the (
character while setting B
to B+1
, or it matches )
while setting B
to B-1
. It does this one or more times. (?(B)(?!))
indicates that if B
is not balanced (anything except 0), it fails.(?=\))
Positive lookahead ensuring what follows is )
Results in the two following matches:
id UUID NOT NULL,
name VARCHAR(50) NOT NULL,
DBA VARCHAR(50),
isactive BOOL NOT NULL DEFAULT TRUE,
isdeleted BOOL NOT NULL DEFAULT FALSE,
createdon TIMESTAMP(6) WITHOUT TIME ZONE NOT NULL
and
OIDS = FALSE
To remove the latter result, you could use the following (ensures (
is not followed by WITH
) as seen in use here.
(?<=(?<!WITH *)\()(?:[^()]|(?<B>\()|(?<-B>\)))+(?(B)(?!))(?=\))
Upvotes: 5