Yasser1984
Yasser1984

Reputation: 2451

How to simulate negative lookbehind in Go

I'm trying to write a regex that can extract a command, here's what I've got so far using a negative lookbehind assertion:

\b(?<![@#\/])\w.*

So with the input:

/msg @nickname #channel foo bar baz
/foo #channel @nickname foo bar baz 
foo bar baz

foo bar baz is extracted every time. See working example https://regex101.com/r/lF9aG7/3

In Go however this doesn't compile http://play.golang.org/p/gkkVZgScS_

It throws:

panic: regexp: Compile(`\b(?<![@#\/])\w.*`): error parsing regexp: invalid or unsupported Perl syntax: `(?<`

I did a bit of research and realized negative lookbehinds are not supported in the language to guarantee O(n) time.

How can I rewrite this regex so that it does the same without negative lookbehind?

Upvotes: 14

Views: 16037

Answers (2)

Mariano
Mariano

Reputation: 6511

You can actually match the preceding character (or the beginning of line) and use a group to get the desired text in a subexpression.

Regex

(?:^|[^@#/])\b(\w+)
  • (?:^|[^@#/]) Matches either ^ the beginning of line or [^@#/] any character except @#/
  • \b A word boundary to assert the beginning of a word
  • (\w+) Generates a subexpression
    • and matches \w+ any number of word characters

Code

cmds := []string{
    `/msg @nickname #channel foo bar baz`,
    `#channel @nickname foo bar baz /foo`,
    `foo bar baz @nickname #channel`,
    `foo bar baz#channel`}

regex := regexp.MustCompile(`(?:^|[^@#/])\b(\w+)`)


// Loop all cmds
for _, cmd := range cmds{
    // Find all matches and subexpressions
    matches := regex.FindAllStringSubmatch(cmd, -1)

    fmt.Printf("`%v` \t==>\n", cmd)

    // Loop all matches
    for n, match := range matches {
        // match[1] holds the text matched by the first subexpression (1st set of parentheses)
        fmt.Printf("\t%v. `%v`\n", n, match[1])
    }
}

Output

`/msg @nickname #channel foo bar baz`   ==>
    0. `foo`
    1. `bar`
    2. `baz`
`#channel @nickname foo bar baz /foo`   ==>
    0. `foo`
    1. `bar`
    2. `baz`
`foo bar baz @nickname #channel`    ==>
    0. `foo`
    1. `bar`
    2. `baz`
`foo bar baz#channel`   ==>
    0. `foo`
    1. `bar`
    2. `baz`

Playground
http://play.golang.org/p/AaX9Cg-7Vx

Upvotes: 2

hjpotter92
hjpotter92

Reputation: 80649

Since in your negated lookbehind, you are only using a simple character set; you can replace it with a negated character-set:

\b[^@#/]\w.*

If the are allowed at the start of the string, then use the ^ anchor:

(?:^|[^@#\/])\b\w.*

Based on the samples in Go playground link in your question, I think you're looking to filter out all words beginning with a character from [#@/]. You can use a filter function:

func Filter(vs []string, f func(string) bool) []string {
    vsf := make([]string, 0)
    for _, v := range vs {
        if f(v) {
            vsf = append(vsf, v)
        }
    }
    return vsf
}

and a Process function, which makes use of the filter above:

func Process(inp string) string {
    t := strings.Split(inp, " ")
    t = Filter(t, func(x string) bool {
        return strings.Index(x, "#") != 0 &&
            strings.Index(x, "@") != 0 &&
            strings.Index(x, "/") != 0
    })
    return strings.Join(t, " ")
}

It can be seen in action on playground at http://play.golang.org/p/ntJRNxJTxo

Upvotes: 5

Related Questions