Randy Layman
Randy Layman

Reputation: 153

Golang Regexp Named Groups and Submatches

I am trying to match a regular expression and get the capturing group name for the match. This works when the regular expression only matches the string once, but if it matches the string more than once, SubexpNames doesn't return the duplicated names.

Here's an example:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile("(?P<first>[a-zA-Z]+) ")
    fmt.Printf("%q\n", re.SubexpNames())
    fmt.Printf("%q\n", re.FindAllStringSubmatch("Alan Turing ", -1))
}

The output is:

["" "first"]
[["Alan " "Alan"] ["Turing " "Turing"]]

Is it possible to get the capturing group name for each submatch?

Upvotes: 15

Views: 17570

Answers (3)

m-szalik
m-szalik

Reputation: 3654

An example of executing ping command under Linux OS with output parsing.

type Result struct {
    AvgTime     time.Duration
    MaxTime     time.Duration
    MinTime     time.Duration
    MDevTime    time.Duration
    Transmitted int
    Received    int
}

func PingHostOrIp(hostOrIp string, pingCount int, timeout time.Duration) (*Result, error) {
    timeoutSec := int(timeout.Seconds())
    outBuff, err := exec.Command("ping", hostOrIp, "-q", fmt.Sprintf("-c %d", pingCount), fmt.Sprintf("-w %d", timeoutSec)).Output()
    if err != nil {
        return nil, err
    }
    out := string(outBuff)
    reg := regexp.MustCompile(`(\d+) packets transmitted, (\d+) received, \d+% packet loss, time .+\nrtt min/avg/max/mdev = ([\d.]+)/([\d.]+)/([\d.]+)/([\d.]+) ms`)
    subMatches := reg.FindStringSubmatch(out)
    if subMatches == nil {
        return nil, errors.New(out)
    }
    res := Result{
        AvgTime:     toDuration(subMatches[4]),
        MaxTime:     toDuration(subMatches[5]),
        MinTime:     toDuration(subMatches[3]),
        MDevTime:    toDuration(subMatches[6]),
        Transmitted: toInt(subMatches[1]),
        Received:    toInt(subMatches[2]),
    }
    return &res, nil
}

func toInt(str string) int {
    i, err := strconv.Atoi(str)
    if err != nil {
        panic(err)
    }
    return i
}

func toDuration(str string) time.Duration {
    f, err := strconv.ParseFloat(str, 32)
    if err != nil {
        panic(err)
    }
    return time.Duration(100*f) * time.Microsecond
}

Upvotes: 1

VonC
VonC

Reputation: 1323773

That might be included in Go 1.14 (Q1 2020, not yet confirmed).
See "proposal: regexp: add (*Regexp).SubexpIndex #32420". Update: it has been included with commit 782fcb4 in Go 1.15 (August 2020).

// SubexpIndex returns the index of the first subexpression with the given name,
// or else -1 if there is no subexpression with that name.
//
// Note that multiple subexpressions can be written using the same name, as in
// (?P<bob>a+)(?P<bob>b+), which declares two subexpressions named "bob".
// In this case SubexpIndex returns the index of the leftmost such subexpression
// in the regular expression.
func (*Regexp) SubexpIndex(name string) int

This is discussed in CL 187919.

re := regexp.MustCompile(`(?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+)`)
fmt.Println(re.MatchString("Alan Turing"))
matches := re.FindStringSubmatch("Alan Turing")
lastIndex := re.SubexpIndex("last")
fmt.Printf("last => %d\n", lastIndex)
fmt.Println(matches[lastIndex])

// Output:
// true
// last => 2
// Turing

Upvotes: 10

alex vasi
alex vasi

Reputation: 5344

Group names and positions are fixed:

re := regexp.MustCompile("(?P<first>[a-zA-Z]+) ")
groupNames := re.SubexpNames()
for matchNum, match := range re.FindAllStringSubmatch("Alan Turing ", -1) {
    for groupIdx, group := range match {
        name := groupNames[groupIdx]
        if name == "" {
            name = "*"
        }
        fmt.Printf("#%d text: '%s', group: '%s'\n", matchNum, group, name)
    }
}

Upvotes: 13

Related Questions