Milo Banks
Milo Banks

Reputation: 549

Split string along regex, but keep matches

I want to split a string on a regular expresion, but preserve the matches.

I have tried splitting the string on a regex, but it throws away the matches. I have also tried using this, but I am not very good at translating code from language to language, let alone C#.

re := regexp.MustCompile(`\d`)
array := re.Split("ab1cd2ef3", -1)

I need the value of array to be ["ab", "1", "cd", "2", "ef", "3"], but the value of array is ["ab", "cd", "ef"]. No errors.

Upvotes: 4

Views: 2290

Answers (4)

Zombo
Zombo

Reputation: 1

You can use a bufio.Scanner:

package main

import (
   "bufio"
   "strings"
)

func digit(data []byte, eof bool) (int, []byte, error) {
   for i, b := range data {
      if '0' <= b && b <= '9' {
         if i > 0 {
            return i, data[:i], nil
         }
         return 1, data[:1], nil
      }
   }
   return 0, nil, nil
}

func main() {
   s := bufio.NewScanner(strings.NewReader("ab1cd2ef3"))
   s.Split(digit)
   for s.Scan() {
      println(s.Text())
   }
}

https://golang.org/pkg/bufio#Scanner.Split

Upvotes: 1

sahaj
sahaj

Reputation: 842

The kind of regex support in the link you have pointed out is NOT available in Go regex package. You can read the related discussion.

What you want to achieve (as per the sample given) can be done using regex to match digits or non-digits.

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "ab1cd2ef3"
    r := regexp.MustCompile(`(\d|[^\d]+)`)
    fmt.Println(r.FindAllStringSubmatch(str, -1))
}

Playground: https://play.golang.org/p/L-ElvkDky53

Output:

[[ab ab] [1 1] [cd cd] [2 2] [ef ef] [3 3]]

Upvotes: 2

Jian-Hua He
Jian-Hua He

Reputation: 1

Dumb solutions. Add separator in the string and split with separator.

package main

import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    input := "ab1cd2ef3"
    sep := "|"

    indexes := re.FindAllStringIndex(input, -1)
    fmt.Println(indexes)

    move := 0
    for _, v := range indexes {
        p1 := v[0] + move
        p2 := v[1] + move
        input = input[:p1] + sep + input[p1:p2] + sep + input[p2:]
        move += 2
    }

    result := strings.Split(input, sep)

    fmt.Println(result)
}

Upvotes: 0

Gustavo Paiva
Gustavo Paiva

Reputation: 156

I don't think this is possible with the current regexp package, but the Split could be easily extended to such behavior.

This should work for your case:

func Split(re *regexp.Regexp, s string, n int) []string {
    if n == 0 {
        return nil
    }

    matches := re.FindAllStringIndex(s, n)
    strings := make([]string, 0, len(matches))

    beg := 0
    end := 0
    for _, match := range matches {
        if n > 0 && len(strings) >= n-1 {
            break
        }

        end = match[0]
        if match[1] != 0 {
            strings = append(strings, s[beg:end])
        }
        beg = match[1]
        // This also appends the current match
        strings = append(strings, s[match[0]:match[1]])
    }

    if end != len(s) {
        strings = append(strings, s[beg:])
    }

    return strings
}

Upvotes: 0

Related Questions