Anton Litvinov
Anton Litvinov

Reputation: 73

Convert utf-8 to single-byte encoding

I have a batch of wrongfully encoded records. This one-liner gives me out a correct result

cat example.txt | iconv -f utf-8 -t iso8859-2

But the following program give me an error encoding: rune not supported by encoding.

func main() {
    s:= []byte {196, 144, 194, 154, 196, 144, 194, 176, 196, 144, 197, 186, 196, 144, 196, 190, 197, 131, 194, 128, 196, 144, 194, 176, 32, 52, 52, 53, 54, 50, 53, 54, 10, 10, 0, 0, }
    fmt.Println(s)

    dec := charmap.ISO8859_2.NewEncoder()
    out, err := dec.Bytes(s)
    if err != nil {
        fmt.Println(err)
        return
    }
    expectedOutput := "Камера 4456256"      
    fmt.Println("result", string(out), "expect:", expectedOutput)
}

I'm wondering if my problem can be resolved without iconv bindings ?

Upvotes: 0

Views: 2297

Answers (1)

typetetris
typetetris

Reputation: 4867

Searching for charmap.ISO8859_2 gives the expression, that your are using golang.org/x/text.

Here we see how the transformation is done, given a Charmap:

https://github.com/golang/text/blob/4d1c5fb19474adfe9562c9847ba425e7da817e81/encoding/charmap/charmap.go#L206

The specific line highlights where the error comes from. So your input contains characters in utf8 which can't be represented in iso8859-2 or invalid utf8.

Here you see, that the error is handed to you faithfully and the usage of replacement inside the RepertoireError seems to be a red herring.

Of course you don't need iconv bindings. You can just iterate through your input character by character and encode it as iso8859-2 and decide yourself, what to do with unrepresentable characters.

Upvotes: 2

Related Questions