Reputation: 2153
I'm currently writing a parser for ColdFusion code. I'm using a regex (in c#) to extract the name datasource attribute of the cfquery tag.
For the time being the regex is the following
<cfquery\s.*datasource\s*=\s*(?:'|")(.*)(?:'|")
it works well for strings like
<cfquery datasource="myDS"
or
<cfquery datasource='myDS'
But it gets crazy when parsing strings like
<cfquery datasource="#GetSourceName('myDS')#"
Obviously the part of the regex (?:'|") is the cause. Is there a way to only match single quote when the first match was a single quote? And only match the double quote when the first match was a double quote?
Thanks in advance!
Upvotes: 9
Views: 7112
Reputation: 118
I would suggest using two different regexes if possible, or splitting the regex in a different way.
For a single regex, considering the question @Mike posted,
("[^"]*")|('[^']*')
Then you can parse out the quotes.
The other potential way of doing this is by using lookahead/lookbehind, but that tends to get messy and isn't universally supported.
Upvotes: 1
Reputation: 3743
Edit: I think this should work in C# you just need to do a back reference:
datasource\s*=\s*('|")(.*)(?:\1)
or perhaps
datasource\s*=\s*('|")(.*)(?:$1)
matches datasource="#GetSourceName('myDS')#"
with a back reference to the first match with \1
.
Of course, you cannot ignore the first capture group with ?:
and still have this work. Also, you may want to set the lazy
flag so as not to match additional "
's
Upvotes: 6
Reputation: 13198
Try looking at this post:
They seem to be dealing with the same problem.
Upvotes: 0