Stuart
Stuart

Reputation: 4258

golang, £ char causing weird  character

I have a function that generates a random string from a string of valid characters. I'm occasionally getting weird results when it selects a £

I've reproduced it to the following minimal example:

func foo() string {
    validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*"
    var result strings.Builder

    for i := 0; i < len(validChars); i++ {

        currChar := validChars[i]
        result.WriteString(string(currChar))
    }
    return result.String()
}

I would expect this to return

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*

But it doesn't, it produces

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*
                                                                  ^
                                             where did you come from ?

if I take the £ sign out of the original validChars string, that weird A goes away.

func foo() string {
    validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*"
    var result strings.Builder

    for i := 0; i < len(validChars); i++ {

        currChar := validChars[i]
        result.WriteString(string(currChar))
    }
    return result.String()
}

This produces abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*

Upvotes: 2

Views: 675

Answers (1)

Emile P.
Emile P.

Reputation: 3962

A string is a type alias for []byte. Your mental model of a string is probably that it consists of a slice of characters - or, as we call it in Go: a slice of rune.

For many runes in your validChars string this is fine, as they are part of the ASCII chars and can therefore be represented in a single byte in UTF-8. However, the £ rune is represented as 2 bytes.

Now if we consider a string £, it consists of 1 rune but 2 bytes. As I've mentioned, a string is really just a []byte. If we grab the first element like you are effectively doing in your sample, we will only get the first of the two bytes that represent £. When you convert it back to a string, it gives you an unexpected rune.


The fix for your problem is to first convert string validChars to a []rune. Then, you can access its individual runes (rather than bytes) by index, and foo will work as expected. You can see it in action in this playground.

Also note that len(validChars) will give you the count of bytes in the string. To get the count of runes, use utf8.RuneCountInString instead.

Finally, here's a blog post from Rob Pike on the subject that you may find interesting.

Upvotes: 9

Related Questions