Reputation: 3628
So my use case is as follows: I am parsing an SQL query trying to grab a function name and respective parameters sent to this function. This requires my regex to be able to find the name, opening parenthesis, content and the closing parenthesis. Unfortunately while testing it turned out it's sometimes too greedy, grabbing additional parenthesis and other times it misses the closing one.
Here's my test code on playground:
func getRegex(name string) string {
return fmt.Sprintf("\\$__%s\\b(?:\\((.*?\\)?)\\))?", name)
}
func main() {
var rawSQL = "(select min(time) from table where $__timeFilter(time))"
rgx, err := regexp.Compile(getRegex("timeFilter"))
if err != nil {
fmt.Println(err)
}
var match = rgx.FindAllStringSubmatch(rawSQL, -1)
fmt.Println(match)
}
with a live example https://go.dev/play/p/4FpZblia7Ks
The 4 cases I am testing are as follows:
(select min(time) from table where $__timeFilter(time) ) OK
(select min(time) from table where $__timeFilter(time)) NOK
select * from foo where $__timeFilter(cast(sth as timestamp)) OK
select * from foo where $__timeFilter(cast(sth as timestamp) ) NOK
here's a live regexr version https://regexr.com/700oh
I come from the javascript world so never used recursive regexes and looks like this might be the case for one ?
Upvotes: 1
Views: 642
Reputation: 3628
I selected Woody's answer as the correct one even though I finally had to go a different route. The attached test cases didn't include for some scenarios AND it turned out I also had to be able to extract the arguments inside of parentheses. So here's my final solution, where I manually parse the text, find the bounding parentheses and extract whatever is in between them:
// getMacroMatches extracts macro strings with their respective arguments from the sql input given
// It manually parses the string to find the closing parenthesis of the macro (because regex has no memory)
func getMacroMatches(input string, name string) ([][]string, error) {
macroName := fmt.Sprintf("\\$__%s\\b", name)
matchedMacros := [][]string{}
rgx, err := regexp.Compile(macroName)
if err != nil {
return nil, err
}
// get all matching macro instances
matched := rgx.FindAllStringIndex(input, -1)
if matched == nil {
return nil, nil
}
for matchedIndex := 0; matchedIndex < len(matched); matchedIndex++ {
var macroEnd = 0
var argStart = 0
macroStart := matched[matchedIndex][0]
inputCopy := input[macroStart:]
cache := make([]rune, 0)
// find the opening and closing arguments brackets
for idx, r := range inputCopy {
if len(cache) == 0 && macroEnd > 0 {
break
}
switch r {
case '(':
cache = append(cache, r)
if argStart == 0 {
argStart = idx + 1
}
case ')':
l := len(cache)
if l == 0 {
break
}
cache = cache[:l-1]
macroEnd = idx + 1
default:
continue
}
}
// macroEnd equals to 0 means there are no parentheses, so just set it
// to the end of the regex match
if macroEnd == 0 {
macroEnd = matched[matchedIndex][1] - macroStart
}
macroString := inputCopy[0:macroEnd]
macroMatch := []string{macroString}
args := ""
// if opening parenthesis was found, extract contents as arguments
if argStart > 0 {
args = inputCopy[argStart : macroEnd-1]
}
macroMatch = append(macroMatch, args)
matchedMacros = append(matchedMacros, macroMatch)
}
return matchedMacros, nil
}
Go playground link: https://go.dev/play/p/-odWKMBLCBv
Upvotes: 0
Reputation: 7970
It appears that your regex has two main problems, one of which is easier to deal with than the other:
These two issues are together causing your regex to fail on those two cases but also causing your first case to match.
To fix this, you'll have to do some preprocessing on the string before sending it to the regex:
if strings.HasPrefix(rawSql, "(") {
rawSql = rawSql[1:len(rawSql) - 1]
}
This will strip off any outer parentheses, which a regex would not be able to ignore without memory or extra clauses.
Next, you'll want to modify your regex to handle the case where whitespace could exist between your inner function call and $__timeFilter
call:
func getRegex(name string) string {
return fmt.Sprintf("\\$__%s\\b(\\((.*?\\)?)\\s*\\))?", name)
}
After doing this, your regex should work. You can find a full example on this playground link.
Upvotes: 2