laurent
laurent

Reputation: 90736

Does Golang do any conversion when casting a byte slice to a string?

Does Golang do any conversion or somehow try to interpret the bytes when casting a byte slice to a string? I've just tried with a byte slice containing a null byte and it looks like it still keep the string as it is.

var test []byte
test = append(test, 'a')
test = append(test, 'b')
test = append(test, 0)
test = append(test, 'd')
fmt.Println(test[2] == 0) // OK

But how about strings with invalid unicode points or UTF-8 encoding. Could the casting fail or the data be corrupted?

Upvotes: 4

Views: 7150

Answers (2)

peterSO
peterSO

Reputation: 166529

The Go Programming Language Specification

String types

A string type represents the set of string values. A string value is a (possibly empty) sequence of bytes.

Conversions

Conversions to and from a string type

Converting a slice of bytes to a string type yields a string whose successive bytes are the elements of the slice.

string([]byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'})   // "hellø"
string([]byte{})                                     // ""
string([]byte(nil))                                  // ""

type MyBytes []byte
string(MyBytes{'h', 'e', 'l', 'l', '\xc3', '\xb8'})  // "hellø"

Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.

[]byte("hellø")   // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}
[]byte("")        // []byte{}

MyBytes("hellø")  // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}

A string value is a (possibly empty) sequence of bytes. A string value may or may not represent Unicode characters encoded in UTF-8. There is no interpretation of the bytes during the conversion from byte slice to string nor from string to byte slice. Therefore, the bytes will not be changed and the conversions will not fail.

Upvotes: 10

joshlf
joshlf

Reputation: 23537

No, the casting can't fail. Here's an example showing this (run in the Go Playground):

b := []byte{0x80}
s := string(b)
fmt.Println(s)
fmt.Println([]byte(s))
for _, c := range s {
    fmt.Println(c)
}

This prints:

�
[128]
65533

Note that ranging over invalid UTF-8 is well defined according to the Go spec:

For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.

Upvotes: 5

Related Questions