JimmyZ
JimmyZ

Reputation: 187

How to (deep) copy a string in Go?

I should probably explain why would I want that first.

I understand in Go substring(s[i:j]) and string.Split and some other string operations work in-place: the resulting substrings share the same memory block of the original string.

For example I read a large string, parse and get a few substrings from it, which will be kept in the long run in a server program, they will "hold" the large memory block from GC, wasting memory. I assume if I could make a copy of those substrings and keep those copies instead, GC could free that large string.

But I can't find a string copy mechanism in Go, I tried converting it to []byte then string again, memory usage dropped roughly 3/4 in my particular use case.

But this doesn't feel right: 1st, it introduces two copy operations. 2nd, since I never really write to that byte slice, I suspect it might got optimized out in release builds.

I can't imagine this hasn't been asked before, but my search doesn't yield any relevant results, or is there some better practices to do these kinds of things in Go?

BTW I tried to append an empty string(+"") to it, memory consumption doesn't drop, I assume it got optimized out even in test builds.

For measuring memory usage, I call runtime.GC() then runtime.ReadMemStats() and compare MemStats.Alloc, which seems pretty consistent in my tests.

Upvotes: 12

Views: 20619

Answers (6)

Jan Tungli
Jan Tungli

Reputation: 47

func Sub(pStr string, pFrom, pTo int) string {
    // Sub("alfa",0,1)="a"; Sub("alfa",1,3)="lf"; Sub("alfa",0,0)="alfa"; Sub("alfa",2,0)="fa"; Sub("alfa",2,999)="fa";
    c := len(pStr)
    if c == 0 || pFrom < 0 {
        return ""
    }
    if pTo == 0 && pFrom == 0 {
        return pStr
    }
    t := ""
    if pTo < pFrom || pTo >= c {
        t = pStr[pFrom:]
    } else {
        t = pStr[pFrom:pTo]
    }
    return strings.Clone(t)
}

Upvotes: -1

VonC
VonC

Reputation: 1328152

The alternative would be, starting with go 1.20 (Q4 2022):

sCopy := strings.Clone(s)

It comes from issues 40200 and 45038, and starts in CLs (change lists) 334884 and 345849

bytes, strings: add Clone

Working directly with []byte and string are common and needing to copy them comes up fairly often.
This change adds a Clone helper to both strings and bytes to fill this need.
A benchmark was also added to provide evidence for why bytes.Clone was implemented with copy.

strings: add Clone function

The new strings.Clone function copies the input string without the returned cloned string referencing the input strings memory

See strings/clone.go

// Clone returns a fresh copy of s.
// It guarantees to make a copy of s into a new allocation,
// which can be important when retaining only a small substring
// of a much larger string. Using Clone can help such programs
// use less memory. Of course, since using Clone makes a copy,
// overuse of Clone can make programs use more memory.
// Clone should typically be used only rarely, and only when
// profiling indicates that it is needed.
// For strings of length zero the string "" will be returned
// and no allocation is made.
func Clone(s string) string {
    if len(s) == 0 {
        return ""
    }
    b := make([]byte, len(s))
    copy(b, s)
    return unsafe.String(&b[0], len(b))
}

Upvotes: 16

vearutop
vearutop

Reputation: 4072

Another way of cloning a string is to create a new one by concatenation with non-empty string, this will allocate a new underlying buffer, and then take a substring of a result:

cloned := (s + ".")[:len(s)]

Upvotes: 1

Steven Dean
Steven Dean

Reputation: 67

Use the following function to deep copy a string:

func deepCopy(s string) string {
    b := make([]byte, len(s))
    copy(b, s)
    return *(*string)(unsafe.Pointer(&b))
}

The function copies the data to a newly allocated slice of bytes. The function uses the unsafe package to convert the slice header to a string header with no copying of the bytes.

If direct use of the unsafe package is a concern, then use strings.Builder. The strings.Builder type executes the unsafe shenanigans under the covers.

 func deepCopy(s string) string {
     var sb strings.Builder
     sb.WriteString(s)
     return sb.String()
 }

There's no need to check the error returned from sb.WriteString. The Builder.WriteString method has an error return so that the Builder type satisfies the io.StringWriter interface, not because WriteString can return a non-nil error.

Upvotes: 5

Abhijit-K
Abhijit-K

Reputation: 3689

String in go are immutable once created. Go specs

I will prefer builder as below. You go on adding to buffer of builder (mutably) WriteString and once done call String method which is returning pointer and not another copy of the buffer slice.

    somestring := "Hello Go"
    var sb strings.Builder
    if _, err := sb.WriteString(somestring); err != nil {
        //log & return
    }
    newstring := sb.String()

Check the implementation of String() of builder from go source. It is returning pointer and cast as *string. No second copy.

// String returns the accumulated string.
func (b *Builder) String() string {
    return *(*string)(unsafe.Pointer(&b.buf))
}

Upvotes: 6

Burak Serdar
Burak Serdar

Reputation: 51632

The string is implemented as a pointer to the underlying byte array and the length of the string. When you create a slice from an existing string, the new string still points to the underlying array, possibly to a different offset in that array, with a different length. That way, many small strings can use the single underlying large array.

As you noted, if you have a large string and you parse it to get smaller strings, you end up keeping the large string in memory, because the GC only knows about the underlying array and pointers to it. There are two ways you can deal with this:

  • Instead of a large string, keep a []byte or use a byte-stream based reader/scanner, and as you parse create strings from the input. That way GC will collect the underlying []byte when parsing is done and you will have your strings without the underlying large block.
  • Do what you already described, and deep-copy string using string([]byte(s[x:y])), or by using copy.

Upvotes: 4

Related Questions