Reputation: 6522
I have a string which has two keywords: "CURRENT NAME(S)" and "NEW NAME(S)" and each of these keywords are followed by a bunch of words. I want to extract those set of words beyond each of these keywords. To elaborate with a code:
s := `"CURRENT NAME(S)
Name1, Name2",,"NEW NAME(S)
NewName1,NewName2"`
re := regexp.MustCompile(`"CURRENT NAME(S).*",,"NEW NAME(S).*"`)
segs := re.FindAllString(s, -1)
fmt.Println("segs:", segs)
segs2 := re.FindAllStringSubmatch(s, -1)
fmt.Println("segs2:", segs2)
As you can see, the string 's' has the input. "Name1,Name2" is the current names list and "NewName1, NewName2" is the new names list. I want to extract these two lists. The two lists are separated by a comma. Each of the keywords are beginning with a double quote and their reach ends, when their corresponding double quote ends.
What is the way to use regexp such that the program can print "Name1, Name2
" and "NewName1,NewName2
" ?
Upvotes: 4
Views: 15731
Reputation: 1491
For a fixed format like in the example, you can also avoid regular expressions and perform explicit parsing as in this example - https://play.golang.org/p/QDIyYiWJHt:
package main import ( "fmt" "strings" ) func main() { s := `"CURRENT NAME(S) Name1, Name2",,"NEW NAME(S) NewName1,NewName2"` names := []string{} parts := strings.Split(s, ",,") for _, part := range parts { part = strings.Trim(part, `"`) part = strings.TrimPrefix(part, "CURRENT NAME(S)") part = strings.TrimPrefix(part, "NEW NAME(S)") part = strings.TrimSpace(part) names = append(names, part) } fmt.Println("Names:") for _, name := range names { fmt.Println(name) } }
Output:
Names: Name1, Name2 NewName1,NewName2
It uses a few more lines of code but may make it easier to understand the processing logic at a first glance.
Upvotes: 0
Reputation: 626747
The issue with your regex is that the input string contains newline symbols, and .
in Go regex does not match a newline. Another issue is that the .*
is a greedy pattern and will match as many symbols as it can up to the last second keyword. Also, you need to escape parentheses in the regex pattern to match the (
and )
literal symbols.
The best way to solve the issue is to change .*
into a negated character class pattern [^"]*
and place it inside a pair of non-escaped (
and )
to form a capturing group (a construct to get submatches from the match).
Here is a Go demo:
package main
import (
"fmt"
"regexp"
)
func main() {
s := `"CURRENT NAME(S)
Name1, Name2",,"NEW NAME(S)
NewName1,NewName2"`
re := regexp.MustCompile(`"CURRENT NAME\(S\)\s*([^"]*)",,"NEW NAME\(S\)\s*([^"]*)"`)
segs2 := re.FindAllStringSubmatch(s,-1)
fmt.Printf("segs2: [%s; %s]", segs2[0][1], segs2[0][2])
}
Now, the regex matches:
"CURRENT NAME\(S\)
- a literal string "CURRENT NAME(S)`\s*
- zero or more whitespaces([^"]*)
- Group 1 capturing 0+ chars other than "
",,"NEW NAME\(S\)
- a literal string ",,"NEW NAME(S)
\s*
- zero or more whitespaces([^"]*)
- Group 2 capturing 0+ chars other than "
"
- a literal "
Upvotes: 4
Reputation: 9126
If your input doesn't change then the simplest way would be to use submatches (groups). You can try something like this:
// (?s) is a flag that enables '.' to match newlines
var r = regexp.MustCompile(`(?s)CURRENT NAME\(S\)(.*)",,"NEW NAME\(S\)(.*)"`)
fmt.Println(r.MatchString(s))
m := r.FindSubmatch([]byte(s)) // FindSubmatch requires []byte
for _, match := range m {
s := string(match)
fmt.Printf("Match - %d: %s\n", i, strings.Trim(s, "\n")) //remove the newline
}
Output: (Note that the first match is the entire input string because it completely matches the regex (https://golang.org/pkg/regexp/#Regexp.FindSubmatch)
Match - 0: CURRENT NAME(S)
Name1, Name2",,"NEW NAME(S)
NewName1,NewName2"
Match - 1: Name1, Name2
Match - 2: NewName1,NewName2
Example: https://play.golang.org/p/0cgBOMumtp
Upvotes: 1