Reputation: 4258
I have a function that generates a random string from a string of valid characters. I'm occasionally getting weird results when it selects a £
I've reproduced it to the following minimal example:
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
I would expect this to return
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*
But it doesn't, it produces
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*
^
where did you come from ?
if I take the £ sign out of the original validChars string, that weird A goes away.
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
This produces
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*
Upvotes: 2
Views: 675
Reputation: 3962
A string
is a type alias for []byte
. Your mental model of a string
is probably that it consists of a slice of characters - or, as we call it in Go: a slice of rune
.
For many runes in your validChars
string this is fine, as they are part of the ASCII chars and can therefore be represented in a single byte in UTF-8. However, the £
rune is represented as 2 bytes.
Now if we consider a string £
, it consists of 1 rune but 2 bytes. As I've mentioned, a string is really just a []byte
. If we grab the first element like you are effectively doing in your sample, we will only get the first of the two bytes that represent £
. When you convert it back to a string, it gives you an unexpected rune.
The fix for your problem is to first convert string validChars
to a []rune
. Then, you can access its individual runes (rather than bytes) by index, and foo
will work as expected. You can see it in action in this playground.
Also note that len(validChars)
will give you the count of bytes in the string. To get the count of runes, use utf8.RuneCountInString
instead.
Finally, here's a blog post from Rob Pike on the subject that you may find interesting.
Upvotes: 9