Reputation: 64135
I am trying to match against inputs like:
foo=bar baz foo:1 foo:234.mds32 notfoo:baz foo:bak foo:nospace foo:bar
and output 6 matches: everything but the notfoo
. The matches should be like foo:bar
(ie not including trailing or leading spaces.
In general, the rules I am trying to match are:
foo
, and a kv pair is delimited by =
or :
.The current best regex I have for this is '(?:\s|^)(?P<primary>foo[:=].+?)\s'
, and then extracting the primary
group.
The problem with this is because we are including the \s
as part of the match, we run into issues with overlapping regex: the foo:bak foo:nospace foo:bar
is broken because we are attempt the whitespace character is matched 2x, and golang regex doesn't return overlapping matches.
In other regex engines I think lookahead can be used, but as far as I can tell this is not allowed with golang regex.
Is there any way to accomplish this?
Go playground link: https://play.golang.org/p/n8gnWwpiBSR
Upvotes: 0
Views: 896
Reputation: 4477
Other people have given excellent answers using regular expressions as requested. Might I be so bold as to suggest a non-regex answer?
I find that regex's are not the best solution for this situation. It is better to split the string using strings.Fields(original)
to get a list of substrings. For each string, split it based on whether it has a =
or :
or neither. The Fields()
function does a great job of parsing similar to the default split in awk
, which skips multiple spaces in a row.
Working example here: https://play.golang.org/p/xXaA9skdplz
original := `foo=bar baz foo:1 foo:234.mds32 notfoo:baz foo:bak foo:nospace foo:bar`
for _, item := range strings.Fields(original) {
if kv := strings.SplitN(item, "=", 2); len(kv) == 2 {
fmt.Printf("key/value: %q -> %q\n", kv[0], kv[1])
} else if kv := strings.SplitN(item, ":", 2); len(kv) == 2 {
fmt.Printf("key/value: %q -> %q\n", kv[0], kv[1])
} else {
fmt.Printf("key: %q\n", item)
}
}
Obviously you'll need to modify this code to collect the answers rather than print them.
If you have to use regex's, then please use the other answers.
Upvotes: 1
Reputation: 239930
There are several approaches you could take:
Just change your pattern to (?:\s|^)(?P<primary>foo[:=]\S+)
as Wiktor Stribiżew mentions in a comment, instead of matching .+?
up to \s
. This solves the problem with no shenanigans, but I will list a few more options that might be applicable to similar problems that couldn't be so easily negated.
Since the problem is with the FindAll
functions not allowing the overlap, don't use them! Instead, roll your own, using FindStringSubmatchIndex
to get the boundaries of one match, extract the matched text by slicing the string, then do d = d[endIndex-1:]
and loop until FindStringSubmatchIndex
returns nil.
Use regexp.Split()
with a pattern of \s+
to break the input string into whitespace-separated components, then just discard the ones that don't regexp.Match()
on ^foo[:=]
. You could even use strings.HasPrefix("foo:") || strings.HasPrefix("foo=")
instead. The remaining ones will be your desired matches, and the whitespace around them will have already been discarded by the split. In my opinion this version conveys intent more clearly than trying to use a match.
Upvotes: 2
Reputation: 626870
It is a pity there is no lookaround support in Go regexp
, thus, you can work around this limitation by doubling whitespaces (e.g. with regexp.MustCompile(
\s).ReplaceAllString(d, "$0$0")
) and then matching with (?:\s|^)(?P<primary>foo[:=]\S+(?:\s+[^:\s]+)*)(?:\s|$)
:
package main
import (
"fmt"
"regexp"
)
func main() {
var d = `foo=bar baz foo:1 foo:234.mds32 notfoo:baz foo:bak foo:nospace foo:bar`
d = regexp.MustCompile(`\s`).ReplaceAllString(d, "$0$0")
r := regexp.MustCompile(`(?:\s|^)(?P<primary>foo[:=]\S+(?:\s+[^:\s]+)*)(?:\s|$)`)
idx := r.SubexpIndex("primary")
for _, m := range r.FindAllStringSubmatch(d, -1) {
fmt.Printf("%q\n", m[idx])
}
}
See the Go demo. Output:
"foo=bar baz"
"foo:1"
"foo:234.mds32"
"foo:bak"
"foo:nospace"
"foo:bar"
Details:
(?:\s|^)
- a whitespace or start of string(?P<primary>foo[:=]\S+(?:\s+[^:\s]+)*)
- Group "primary": foo
, a colon or =
char, one or more non-whitespaces, and then zero or more occurrences of one or more whitespaces and then one or more chars other than a whitespace or colon(?:\s|$)
- a whitepace or end of string.Upvotes: 2