Malcolm
Malcolm

Reputation: 2489

Is there an equivalent to Java's String intern function in Go?

Is there an equivalent to Java's String intern function in Go?

I am parsing a lot of text input that has repeating patterns (tags). I would like to be memory efficient about it and store pointers to a single string for each tag, instead of multiple strings for each occurrence of a tag.

Upvotes: 4

Views: 2278

Answers (3)

zangw
zangw

Reputation: 48346

After GO 1.23, unique: new package with unique.Handle will be function as string interning in Go.

The unique package must maintain an internal mapping of values to globally unique symbols, which risks growing without bound.

Sample codes https://go.dev/play/p/t-sSGFxF632?v=gotip

import (
    "fmt"
    "runtime"
    "unique"
)

var words = []string{
    "foofoooabqwryuopsdfghjkcvbnmm",
    "barbarzxcvbnm123456789wdfghjk",
    "234456789sdfghjklcvbnmadddddd",
    "barbarzxcvbnm123456789wdfghjk",
    "234456789sdfghjklcvbnmadddddd",
}

const N = 1000000

func getAlloc() uint64 {
    var m runtime.MemStats
    runtime.GC()
    runtime.ReadMemStats(&m)
    return m.Alloc
}

func test() {
    before := getAlloc()
    a := make([]string, N)
    for i := 0; i < len(a); i++ {
        a[i] = words[i%len(words)]
    }
    fmt.Printf("test memory: %v \n", getAlloc()-before)
}

func test_with_unique() {
    before := getAlloc()
    a := make([]unique.Handle[string], N)
    for i := 0; i < len(a); i++ {
        a[i] = unique.Make(words[i%len(words)])
    }
    fmt.Printf("test with unique memory: %v \n", getAlloc()-before)
}

func main() {
    test()
    runtime.GC()
    test_with_unique()
}

Output

test memory: 9984 
test with unique memory: 1344 

Upvotes: 3

zzzz
zzzz

Reputation: 91193

I think that for example Pool and GoPool may fulfill your needs. That code solves one thing which Stephen's solution ignores. In Go, a string value may be a slice of a bigger string. Scenarios are where it doesn't matter and scenarios are where that is a show stopper. The linked functions attempt to be on the safe side.

Upvotes: 2

Stephen Weinberg
Stephen Weinberg

Reputation: 53398

No such function exists that I know of. However, you can make your own very easily using maps. The string type itself is a uintptr and a length. So, a string assigned from another string takes up only two words. Therefore, all you need to do is ensure that there are no two strings with redundant content.

Here is an example of what I mean.

type Interner map[string]string

func NewInterner() Interner {
    return Interner(make(map[string]string))
}

func (m Interner) Intern(s string) string {
    if ret, ok := m[s]; ok {
        return ret
    }

    m[s] = s
    return s
}

This code will deduplicate redundant strings whenever you do the following:

str = interner.Intern(str)

EDIT: As jnml mentioned, my answer could pin memory depending on the string it is given. There are two ways to solve this problem. Both of these should be inserted before m[s] = s in my previous example. The first copies the string twice, the second uses unsafe. Neither are ideal.

Double copy:

b := []byte(s)
s = string(b)

Unsafe (use at your own risk. Works with current version of gc compiler):

b := []byte(s)
s = *(*string)(unsafe.Pointer(&b))

Upvotes: 5

Related Questions