Reputation: 2489
Is there an equivalent to Java's String intern function in Go?
I am parsing a lot of text input that has repeating patterns (tags). I would like to be memory efficient about it and store pointers to a single string for each tag, instead of multiple strings for each occurrence of a tag.
Upvotes: 4
Views: 2278
Reputation: 48346
After GO 1.23, unique: new package with unique.Handle will be function as string interning in Go.
The unique package must maintain an internal mapping of values to globally unique symbols, which risks growing without bound.
Sample codes https://go.dev/play/p/t-sSGFxF632?v=gotip
import (
"fmt"
"runtime"
"unique"
)
var words = []string{
"foofoooabqwryuopsdfghjkcvbnmm",
"barbarzxcvbnm123456789wdfghjk",
"234456789sdfghjklcvbnmadddddd",
"barbarzxcvbnm123456789wdfghjk",
"234456789sdfghjklcvbnmadddddd",
}
const N = 1000000
func getAlloc() uint64 {
var m runtime.MemStats
runtime.GC()
runtime.ReadMemStats(&m)
return m.Alloc
}
func test() {
before := getAlloc()
a := make([]string, N)
for i := 0; i < len(a); i++ {
a[i] = words[i%len(words)]
}
fmt.Printf("test memory: %v \n", getAlloc()-before)
}
func test_with_unique() {
before := getAlloc()
a := make([]unique.Handle[string], N)
for i := 0; i < len(a); i++ {
a[i] = unique.Make(words[i%len(words)])
}
fmt.Printf("test with unique memory: %v \n", getAlloc()-before)
}
func main() {
test()
runtime.GC()
test_with_unique()
}
Output
test memory: 9984
test with unique memory: 1344
Upvotes: 3
Reputation: 91193
I think that for example Pool and GoPool may fulfill your needs. That code solves one thing which Stephen's solution ignores. In Go, a string value may be a slice of a bigger string. Scenarios are where it doesn't matter and scenarios are where that is a show stopper. The linked functions attempt to be on the safe side.
Upvotes: 2
Reputation: 53398
No such function exists that I know of. However, you can make your own very easily using maps. The string type itself is a uintptr and a length. So, a string assigned from another string takes up only two words. Therefore, all you need to do is ensure that there are no two strings with redundant content.
Here is an example of what I mean.
type Interner map[string]string
func NewInterner() Interner {
return Interner(make(map[string]string))
}
func (m Interner) Intern(s string) string {
if ret, ok := m[s]; ok {
return ret
}
m[s] = s
return s
}
This code will deduplicate redundant strings whenever you do the following:
str = interner.Intern(str)
EDIT: As jnml mentioned, my answer could pin memory depending on the string it is given. There are two ways to solve this problem. Both of these should be inserted before m[s] = s
in my previous example. The first copies the string twice, the second uses unsafe. Neither are ideal.
Double copy:
b := []byte(s)
s = string(b)
Unsafe (use at your own risk. Works with current version of gc compiler):
b := []byte(s)
s = *(*string)(unsafe.Pointer(&b))
Upvotes: 5