n00dl3
n00dl3

Reputation: 21564

Golang: Why does regexp.FindAllStringSubmatch() returns [][]string and not []string?

I am kind of new to go and that's the first time I have to deal with regexp.

I am a bit surprised that the someregex.FindAllStringSubmatch("somestring", -1) returns a slice of slice [][]string instead of a simple slice of string : []string.

example :

someRegex, _ := regexp.Compile("^.*(mes).*$")
matches := someRegex.FindAllStringSubmatch("somestring", -1)
fmt.Println(matches) // logs [[somestring mes]]

What is the reason of this behavior, I can't figure it out ?

Upvotes: 5

Views: 12907

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

The func (*Regexp) FindAllStringSubmatch extracts matches and captured submatches.

A submatch is a part of the text that is matched by the regex part that is enclosed with a pair of unescaped parentheses (a so called capturing group).

In your case, ^.*(mes).*$ matches:

  • ^ - start of string
  • .* - any 0+ chars as many as possible
  • (mes) - Capturing group 1: a mes substring
  • .*$ - the rest of the string.

So, the match value is the whole string. It will be the first value in the output. Then, since there is a capturing group, there must be a place for it in the results, hence, mes is placed as the second item in the list.

Since there may be more matches than 1, we need a list of lists.

A better example may be the one with several match / submatch extraction (and maybe an optional group, too):

package main

import (
    "fmt"
    "regexp"
)

func main() {
    someRegex, _ := regexp.Compile(`[^aouiye]([aouiye])([^aouiye])?`)
    matches := someRegex.FindAllStringSubmatch("somestri", -1)
    fmt.Printf("%q\n", matches)
}

The [^aouiye]([aouiye])([^aouiye])? matches a non-vowel, a vowel, and a non-vowel, capturing the last 2 into separate groups #1 and #2.

The results are [["som" "o" "m"] ["ri" "i" ""]]. There are 2 matches, and each contains a match value, Group 1 value and Group 2 value. Since the ri match has no text captured into Group 2 (([^aouiye])?), it is empty, but it is still there since the group is defined in the regex pattern.

Upvotes: 10

Lajos Arpad
Lajos Arpad

Reputation: 76434

FindAllStringSubmatch is the 'All' version of FindStringSubmatch; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

Docs.

To sum up: You need an array of arrays of strings, because this is the all version of FindStringSubmatch. FindStringSubmatch will return a single string array.

Upvotes: 3

Related Questions