Reputation: 85

Haskell not parsing text correctly

In a test file I have the following test String:

部類 Test《

I've tried encoding the file in both UTF-8 with and without BOM, and using UCS-2. I've tried setting the encoding of Haskell to UTF-8 aswell.

The text always comes out as (or worse):

"\8745\9559\9488\920\226\191\920\237\8359 Test\960\199\232"

Whenever I type print "《" the code is "\12298" and not \960\199\232 as seen from the fileread.

Any solutions for this behaviour?

Upvotes: 3

Answers (1)

Daniel Wagner

Reputation: 153222

At a guess: you are using readFile or similar and are using a non-UTF8, non-UCS2 locale. You can fix things up by setting the encodings of the things you read from (the file handle) and write to (stdout or whatever) explicitly. For example, the following program reliably reads and writes your test file correctly for me:

import System.IO

main = do
    hSetEncoding stdout utf8
    withFile "test.txt" ReadMode $ \h -> do
        hSetEncoding h utf8
        s <- hGetContents h
        print s
        putStr s

Another option is to run your existing program with an appropriate locale; for example, try:

LANG=en_US.utf8 runhaskell test.hs

In the most-used modern shells, this will set the LANG environment variable appropriately for a single run of the program in test.hs.

Upvotes: 8

Haskell not parsing text correctly

Answers (1)

Related Questions