Reputation: 85
In a test file I have the following test String:
部類 Test《
I've tried encoding the file in both UTF-8 with and without BOM, and using UCS-2. I've tried setting the encoding of Haskell to UTF-8 aswell.
The text always comes out as (or worse):
"\8745\9559\9488\920\226\191\920\237\8359 Test\960\199\232"
Whenever I type print "《"
the code is "\12298"
and not \960\199\232
as seen from the fileread.
Any solutions for this behaviour?
Upvotes: 3
Views: 286
Reputation: 153222
At a guess: you are using readFile
or similar and are using a non-UTF8, non-UCS2 locale. You can fix things up by setting the encodings of the things you read from (the file handle) and write to (stdout or whatever) explicitly. For example, the following program reliably reads and writes your test file correctly for me:
import System.IO
main = do
hSetEncoding stdout utf8
withFile "test.txt" ReadMode $ \h -> do
hSetEncoding h utf8
s <- hGetContents h
print s
putStr s
Another option is to run your existing program with an appropriate locale; for example, try:
LANG=en_US.utf8 runhaskell test.hs
In the most-used modern shells, this will set the LANG
environment variable appropriately for a single run of the program in test.hs
.
Upvotes: 8