Reputation: 153
I've been working with some huge files that I have to convert to UTF-8, as the files ar enormous traditional tools like iconv won't work. So I decided to write my own tool in Go, however I noticed that this encoding conversion is quite slow in Go. here is my code:
package main
import (
"fmt"
"io"
"log"
"os"
"golang.org/x/text/encoding/charmap"
)
func main() {
if len(os.Args) != 3 {
fmt.Fprintf(os.Stderr, "usage:\n\t%s [input] [output]\n", os.Args[0])
os.Exit(1)
}
f, err := os.Open(os.Args[1])
if err != nil {
log.Fatal(err)
}
out, err := os.Create(os.Args[2])
if err != nil {
log.Fatal(err)
}
r := charmap.ISO8859_1.NewDecoder().Reader(f)
buf := make([]byte, 1048576)
io.CopyBuffer(out, r, buf)
out.Close()
f.Close()
}
Similar code in Python is much more performant:
import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open("FRWAC-01.xml", "r", "latin_1") as sourceFile:
with codecs.open("FRWAC-01-utf8.xml", "w", "utf-8") as targetFile:
while True:
contents = sourceFile.read(BLOCKSIZE)
if not contents:
break
targetFile.write(contents)
I was sure my Go code would be much quicker because in general I/O in Go is fast, but it turns out is much slower than the Python code. Is there a way to improve the Go program?
Upvotes: 0
Views: 1079
Reputation: 109347
The problem here is that you're not comparing the same code in both cases. Also IO speed in Go can't be significantly different that python, since they are making the same syscalls.
In the python version, the files are buffered by default. In the Go version, while you're using io.CopyBuffer
with a 1048576
byte buffer, the decoder is going to make whatever size Read
calls it needs directly on the unbuffered file.
Wrapping the file IO with bufio
will produce comparable results.
inFile, err := os.Open(os.Args[1])
if err != nil {
log.Fatal(err)
}
defer inFile.Close()
outFile, err := os.Create(os.Args[2])
if err != nil {
log.Fatal(err)
}
defer outFile.Close()
in := bufio.NewReaderSize(inFile, 1<<20)
out := bufio.NewWriterSize(outFile, 1<<20)
defer out.Flush()
r := charmap.ISO8859_1.NewDecoder().Reader(in)
if _, err := io.Copy(out, r); err != nil {
log.Fatal(err)
}
Upvotes: 4