Reputation: 187
I should probably explain why would I want that first.
I understand in Go substring(s[i:j]
) and string.Split
and some other string operations work in-place: the resulting substrings share the same memory block of the original string.
For example I read a large string, parse and get a few substrings from it, which will be kept in the long run in a server program, they will "hold" the large memory block from GC, wasting memory. I assume if I could make a copy of those substrings and keep those copies instead, GC could free that large string.
But I can't find a string copy mechanism in Go, I tried converting it to []byte
then string
again, memory usage dropped roughly 3/4 in my particular use case.
But this doesn't feel right: 1st, it introduces two copy operations. 2nd, since I never really write to that byte slice, I suspect it might got optimized out in release builds.
I can't imagine this hasn't been asked before, but my search doesn't yield any relevant results, or is there some better practices to do these kinds of things in Go?
BTW I tried to append an empty string(+""
) to it, memory consumption doesn't drop, I assume it got optimized out even in test builds.
For measuring memory usage, I call runtime.GC()
then runtime.ReadMemStats()
and compare MemStats.Alloc
, which seems pretty consistent in my tests.
Upvotes: 12
Views: 20619
Reputation: 47
func Sub(pStr string, pFrom, pTo int) string {
// Sub("alfa",0,1)="a"; Sub("alfa",1,3)="lf"; Sub("alfa",0,0)="alfa"; Sub("alfa",2,0)="fa"; Sub("alfa",2,999)="fa";
c := len(pStr)
if c == 0 || pFrom < 0 {
return ""
}
if pTo == 0 && pFrom == 0 {
return pStr
}
t := ""
if pTo < pFrom || pTo >= c {
t = pStr[pFrom:]
} else {
t = pStr[pFrom:pTo]
}
return strings.Clone(t)
}
Upvotes: -1
Reputation: 1328152
The alternative would be, starting with go 1.20 (Q4 2022):
sCopy := strings.Clone(s)
It comes from issues 40200 and 45038, and starts in CLs (change lists) 334884 and 345849
bytes, strings: add Clone
Working directly with
[]byte
andstring
are common and needing to copy them comes up fairly often.
This change adds aClone
helper to both strings and bytes to fill this need.
A benchmark was also added to provide evidence for whybytes.Clone
was implemented withcopy
.strings: add Clone function
The new
strings.Clone
function copies the input string without the returned cloned string referencing the input strings memory
See strings/clone.go
// Clone returns a fresh copy of s.
// It guarantees to make a copy of s into a new allocation,
// which can be important when retaining only a small substring
// of a much larger string. Using Clone can help such programs
// use less memory. Of course, since using Clone makes a copy,
// overuse of Clone can make programs use more memory.
// Clone should typically be used only rarely, and only when
// profiling indicates that it is needed.
// For strings of length zero the string "" will be returned
// and no allocation is made.
func Clone(s string) string {
if len(s) == 0 {
return ""
}
b := make([]byte, len(s))
copy(b, s)
return unsafe.String(&b[0], len(b))
}
Upvotes: 16
Reputation: 4072
Another way of cloning a string is to create a new one by concatenation with non-empty string, this will allocate a new underlying buffer, and then take a substring of a result:
cloned := (s + ".")[:len(s)]
Upvotes: 1
Reputation: 67
Use the following function to deep copy a string:
func deepCopy(s string) string {
b := make([]byte, len(s))
copy(b, s)
return *(*string)(unsafe.Pointer(&b))
}
The function copies the data to a newly allocated slice of bytes. The function uses the unsafe package to convert the slice header to a string header with no copying of the bytes.
If direct use of the unsafe package is a concern, then use strings.Builder. The strings.Builder type executes the unsafe shenanigans under the covers.
func deepCopy(s string) string {
var sb strings.Builder
sb.WriteString(s)
return sb.String()
}
There's no need to check the error returned from sb.WriteString. The Builder.WriteString method has an error return so that the Builder type satisfies the io.StringWriter interface, not because WriteString can return a non-nil error.
Upvotes: 5
Reputation: 3689
String in go are immutable once created. Go specs
I will prefer builder as below. You go on adding to buffer of builder (mutably) WriteString
and once done call String
method which is returning pointer and not another copy of the buffer slice.
somestring := "Hello Go"
var sb strings.Builder
if _, err := sb.WriteString(somestring); err != nil {
//log & return
}
newstring := sb.String()
Check the implementation of String() of builder from go source. It is returning pointer and cast as *string. No second copy.
// String returns the accumulated string.
func (b *Builder) String() string {
return *(*string)(unsafe.Pointer(&b.buf))
}
Upvotes: 6
Reputation: 51632
The string is implemented as a pointer to the underlying byte array and the length of the string. When you create a slice from an existing string, the new string still points to the underlying array, possibly to a different offset in that array, with a different length. That way, many small strings can use the single underlying large array.
As you noted, if you have a large string and you parse it to get smaller strings, you end up keeping the large string in memory, because the GC only knows about the underlying array and pointers to it. There are two ways you can deal with this:
[]byte
or use a byte-stream based reader/scanner, and as you parse create strings from the input. That way GC will collect the underlying []byte
when parsing is done and you will have your strings without the underlying large block.string([]byte(s[x:y]))
, or by using copy
.Upvotes: 4