Junaid s
Junaid s

Reputation: 89

Splitter in Golang

Below is the Java code, I need something similar in Go:

List<String> tokens = Lists.newArrayList(Splitter.on(CharMatcher.anyOf("[]//"))
.trimResults().omitEmptyStrings().split(entry.getValue()))

This is what I have tried:

re := regexp.MustCompile(`[//]`)
tokens := re.Split(entry, -1)

Upvotes: 3

Views: 538

Answers (1)

icza
icza

Reputation: 417582

Using regexp is usually slower than doing it manually. Since the task is not complex, the non-regexp solution isn't complicated either.

You may use strings.FieldsFunc() to split a string on a set of characters, and strings.TrimSpace() to strip off leading and trailing white-spaces.

Here's a simple function doing what you want:

func split(s, sep string) (tokens []string) {
    fields := strings.FieldsFunc(s, func(r rune) bool {
        return strings.IndexRune(sep, r) != -1
    })
    for _, s2 := range fields {
        s2 = strings.TrimSpace(s2)
        if s2 != "" {
            tokens = append(tokens, s2)
        }
    }
    return
}

Testing it:

fmt.Printf("%q\n", split("a,b;c, de; ; fg ", ",;"))
fmt.Printf("%q\n", split("a[b]c[ de/ / fg ", "[]/"))

Output (try it on the Go Playground):

["a" "b" "c" "de" "fg"]
["a" "b" "c" "de" "fg"]

Improvements

If performance is an issue and you have to call this split() function many times, it would be profitable to create a set-like map from the separator characters, and reuse that, so inside the function passed to strings.FieldFunc(), you can simply check if the rune is in this map, so you would not need to call strings.IndexRune() to decide if the given rune is a separator character.

The performance gain might not be significant if you have few separator characters (like 1-3 characters), but if you would have a lot more, using a map could significantly improve performance.

This is how it could look like:

var (
    sep1 = map[rune]bool{',': true, ';': true}
    sep2 = map[rune]bool{'[': true, ']': true, '/': true}
)

func split(s string, sep map[rune]bool) (tokens []string) {
    fields := strings.FieldsFunc(s, func(r rune) bool {
        return sep[r]
    })
    for _, s2 := range fields {
        s2 = strings.TrimSpace(s2)
        if s2 != "" {
            tokens = append(tokens, s2)
        }
    }
    return
}

Testing it:

fmt.Printf("%q\n", split("a,b;c, de; ; fg ", sep1))
fmt.Printf("%q\n", split("a[b]c[ de/ / fg ", sep2))

Output is the same. Try this one on the Go Playground.

Upvotes: 5

Related Questions