Reputation: 4180
Example code:
var reStr = `"(?:\\"|[^"])*"`
var reStrSum = regexp.MustCompile(`(?m)(` + reStr + `)\s*\+\s*(` + reStr + `)\s*\+\s*(` + reStr + `)`)
var str = `"This\nis\ta\\string" +
"Another\"string" +
"Third string"
`
for i, match := range reStrSum.FindAllStringSubmatch(str, -1) {
fmt.Println(match, "found at index", i)
for i, str := range match {
fmt.Println(i, str)
}
}
Output:
["This\nis\ta\\string" +
"Another\"string" +
"Third string" "This\nis\ta\\string" "Another\"string" "Third string"] found at index 0
0 "This\nis\ta\\string" +
"Another\"string" +
"Third string"
1 "This\nis\ta\\string"
2 "Another\"string"
3 "Third string"
E.g. it matches the "sum of strings" and it captures all three strings correctly.
My problem is that I do not want to match the sum of exactly three strings. I want to match all "sum of strings" where the sum can consist of one or more string literals. I have tried to express this with {0,}
var reStr = `"(?:\\"|[^"])*"`
var reStrSum = regexp.MustCompile(`(?m)(` + reStr + `)` + `(?:\s*\+\s*(` + reStr + `)){0,}`)
var str = `
test1("This\nis\ta\\string" +
"Another\"string" +
"Third string summed");
test2("Second string " + "sum");
`
for i, match := range reStrSum.FindAllStringSubmatch(str, -1) {
fmt.Println(match, "found at index", i)
for i, str := range match {
fmt.Println(i, str)
}
}
`)){0,}`)
then I get this result:
["This\nis\ta\\string" +
"Another\"string" +
"Third string summed" "This\nis\ta\\string" "Third string summed"] found at index 0
0 "This\nis\ta\\string" +
"Another\"string" +
"Third string summed"
1 "This\nis\ta\\string"
2 "Third string summed"
["Second string " + "sum" "Second string " "sum"] found at index 1
0 "Second string " + "sum"
1 "Second string "
2 "sum"
Group 0 of the first match contains all three strings (the regexp matches correctly), but there are only two capturing groups in the expression, and the second group only contains the last iteration of the repetition. E.g. "Another\"string"
is lost in the process, it cannot be accessed.
Would it be possible to get all iterations of (all repetitions) inside group 2 somehow?
I would also accept any workaround that uses nested loops. But please be aware that I cannot simply replace the {0,}
repetition with an outer FindAllStringSubmatch
call, because the FindAllStringSubmatch
call is already used for iterating over "sums of strings". In other words, I must find the first string sum and also the "Second string sum"
.
Upvotes: 1
Views: 88
Reputation: 4180
I just found a workaround that will work. I can do two passes. In the first pass, I just match all string literals, and replace them with unique placeholders in the original text. Then the transformed text won't contain any strings, and it becomes much easier to do further processing on it in a second pass.
Something like this:
type javaString struct {
value string
lineno int
}
// First we find all string literals
var placeholder = "JSTR"
var reJavaStringLiteral = regexp.MustCompile(`(?m)("(?:\\"|[^"])*")`)
javaStringLiterals := make([]javaString, 0)
for pos, strMatch := range reJavaStringLiteral.FindAllStringSubmatch(strContent, -1) {
pos = strings.Index(strContent, strMatch[0])
head := strContent[0:pos]
lineno := strings.Count(head, "\n") + 1
javaStringLiterals = append(javaStringLiterals, javaString{value: strMatch[1], lineno: lineno})
}
// Next, we replace all string literals with placeholders.
for i, jstr := range javaStringLiterals {
strContent = strings.Replace(strContent, jstr.value, fmt.Sprintf("%v(%v)", placeholder, i), 1)
}
// Now the transformed text does not contain any string literals.
After the first pass, the original text becomes:
test1(JSTR(1) +
JSTR(2) +
JSTR(3));
test2(JSTR(3) + JSTR(4));
After this step, I can easily look for "JSTR(\d+) + JSTR(\d+) + JSTR(\d+)..." expressions. Now they are easy to find, because the text does not contain any strings (that could otherwise contain practically anything and interfere with regular expressions). These "sum of string" matches can then be re-matched with another FindAllStringSubmatch
(in an inner loop) and then I'll get all information that I needed.
This is not a real solution, because it requires writting a lot of code, it is specific to my concrete use case, and does not really answer the original question: allow access to all iterations inside a repeated capturing group.
But the general idea of the workaround might be benefical for somebody who is facing a similar problem.
Upvotes: 2