stef
stef

Reputation: 313

bytes.String() vs bytes.Bytes() in Go

Consider a text file like this:

Some text
here.
---
More text
another line.
---
Third part of text.

I want to split it into three parts, divided by the --- separator. The parts should be stored in a map.

Now, the exact same programs with different types.

When I use string, everything works fine:

KEY: 0
Some text
here.
KEY: 1
More text
another line.
KEY: 2
Third part of text.

https://play.golang.org/p/IcGdoUNcTEe

When I use []byte, things gets messed up:

KEY: 0
Third part of teKEY: 1
Third part of text.
ne.
KEY: 2
Third part of text.

https://play.golang.org/p/jqLhCrqsvOs

Why?


Program 1 (string):

func main() {
    parts := parseParts([]byte(input))

    for k, v := range parts {
        fmt.Printf("KEY: %d\n%s", k, v)
    }
}

func parseParts(input []byte) map[int]string {
    parts := map[int]string{}
    s := bufio.NewScanner(bytes.NewReader(input))
    buf := bytes.Buffer{}
    i := 0
    for s.Scan() {
        if s.Text() == "---" {
            parts[i] = buf.String()
            buf.Reset()
            i++
            continue
        }
        buf.Write(s.Bytes())
        buf.WriteString("\n")
    }
    parts[i] = buf.String()
    return parts
}

Program 2 ([]byte):

func main() {
    parts := parseParts([]byte(input))

    for k, v := range parts {
        fmt.Printf("KEY: %d\n%s", k, v)
    }
}

func parseParts(input []byte) map[int]string {
    parts := map[int]string{}
    s := bufio.NewScanner(bytes.NewReader(input))
    buf := bytes.Buffer{}
    i := 0
    for s.Scan() {
        if s.Text() == "---" {
            parts[i] = buf.String()
            buf.Reset()
            i++
            continue
        }
        buf.Write(s.Bytes())
        buf.WriteString("\n")
    }
    parts[i] = buf.String()
    return parts
}

Upvotes: 0

Views: 464

Answers (2)

Darshan Rivka Whittle
Darshan Rivka Whittle

Reputation: 34031

In the string version,

parts[i] = buf.String()

sets parts[i] to a new string every time. In the []byte version,

parts[i] = buf.Bytes()

sets parts[i] to a byte slice backed by the same array every time. The contents of the backing array are the same for all three slices, but the lengths match the length when created, which is why all three slices show the same content but cut off at different places.

You could replace the byte slice line

parts[i] = buf.Bytes()

with something like this:

bb := buf.Bytes()
b := make([]byte, len(bb))
copy(b, bb)
parts[i] = b

in order to get the behavior to match the string version. But the string version is easier and better matches what you seem to be trying to do.

Upvotes: 5

robx
robx

Reputation: 2329

The difference is that bytes.Buffer.String copies the memory, while bytes.Buffer.Bytes does not. Quoting the documentation,

The slice is valid for use only until the next buffer modification (that is, only until the next call to a method like Read, Write, Reset, or Truncate).

Upvotes: 3

Related Questions