Ali Bahrami
Ali Bahrami

Reputation: 6073

How to convert from an encoding to UTF-8 in Go?

I'm working on a project where I need to convert text from an encoding (for example Windows-1256 Arabic) to UTF-8.

How do I do this in Go?

Upvotes: 20

Views: 47899

Answers (3)

igonejack
igonejack

Reputation: 2532

I made a tool for myself, maybe you could borrow some idea from it :)

https://github.com/gonejack/transcode

This is the key code:

_, err = io.Copy(
    transform.NewWriter(output, targetEncoding.NewEncoder()),
    transform.NewReader(input, sourceEncoding.NewDecoder()),
)

Upvotes: 0

Alexis Wilke
Alexis Wilke

Reputation: 20730

I checked out the docs, here, and I came up with a way to convert an array of bytes to (or from) UTF-8.

What I have a hard time with is that, so far, I've not found an interface that would allow me to use a locale. Instead, it's like the possible ways are limited to predefined sets of encodings.

In my case, I needed to convert UTF-16 (really I have USC-2 data, but it should still work) to UTF-8. To do that, I needed to check for the BOM and then do the conversion:

bom := buf[0] + buf[1] * 256
if bom == 0xFEFF {
    enc = unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM)
} else if bom == 0xFFFE {
    enc = unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
} else {
    return Error("BOM missing")
}

e := enc.NewDecoder()

// convert USC-2 (LE or BE) to UTF-8
utf8 := e.Bytes(buf[2:])

Unfortunate that I have to use "ignore" BOM since in my case it should instead be forbidden past the first character. But that's close enough for my situation. These functions were mentioned in a couple of places, but not shown in practice.

Upvotes: 0

rob74
rob74

Reputation: 5238

You can use the encoding package, which includes support for Windows-1256 via the package golang.org/x/text/encoding/charmap (in the example below, import this package and use charmap.Windows1256 instead of japanese.ShiftJIS).

Here's a short example which encodes a japanese UTF-8 string to ShiftJIS encoding and then decodes the ShiftJIS string back to UTF-8. Unfortunately it doesn't work on the playground since the playground doesn't have the "x" packages.

package main

import (
    "bytes"
    "fmt"
    "io/ioutil"
    "strings"

    "golang.org/x/text/encoding/japanese"
    "golang.org/x/text/transform"
)

func main() {
    // the string we want to transform
    s := "今日は"
    fmt.Println(s)

    // --- Encoding: convert s from UTF-8 to ShiftJIS 
    // declare a bytes.Buffer b and an encoder which will write into this buffer
    var b bytes.Buffer
    wInUTF8 := transform.NewWriter(&b, japanese.ShiftJIS.NewEncoder())
    // encode our string
    wInUTF8.Write([]byte(s))
    wInUTF8.Close()
    // print the encoded bytes
    fmt.Printf("%#v\n", b)
    encS := b.String()
    fmt.Println(encS)

    // --- Decoding: convert encS from ShiftJIS to UTF8
    // declare a decoder which reads from the string we have just encoded
    rInUTF8 := transform.NewReader(strings.NewReader(encS), japanese.ShiftJIS.NewDecoder())
    // decode our string
    decBytes, _ := ioutil.ReadAll(rInUTF8)
    decS := string(decBytes)
    fmt.Println(decS)
}

There's a more complete example on the Japanese StackOverflow site. The text is Japanese, but the code should be self-explanatory: https://ja.stackoverflow.com/questions/6120

Upvotes: 21

Related Questions