Trismegistos
Trismegistos

Reputation: 3882

Wrong character encoding in simple Haskell code

I have problems with character encoding in haskell. This simple program write wrong results. What I am really interested here is encode function which forces me to use ByteString. Application is:

import Data.ByteString.Char8 (unpack, pack)
import Data.ByteString.Lazy (toStrict)
import Data.Csv (encode) -- cabal install cassava

main = do
    -- (middle character is polish diacritic letter)
    putStrLn $ unpack $ pack "aća"
    putStrLn $ unpack $ toStrict $ encode ["aća"]

It should print

aća
a,ć,a

but instead it writes

aa
a,Ä,a

This breaks my application encoding CSV. This happen on Linux no matter of my locale settings

$ locale
LANG=pl_PL.UTF-8
LC_CTYPE="pl_PL.UTF-8"
LC_NUMERIC="pl_PL.UTF-8"
LC_TIME="pl_PL.UTF-8"
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY="pl_PL.UTF-8"
LC_MESSAGES="pl_PL.UTF-8"
LC_PAPER="pl_PL.UTF-8"
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT="pl_PL.UTF-8"
LC_IDENTIFICATION="pl_PL.UTF-8"
LC_ALL=pl_PL.UTF-8

or

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

What I want to know is how to convert output of encode (Data.ByteString.Lazy.ByteString) to String so I can write it to file using e.g. writeFile function.

Upvotes: 0

Views: 928

Answers (1)

Reid Barton
Reid Barton

Reputation: 14999

You should simply use Data.ByteString.Lazy.putStr rather than putStrLn . unpack . toStrict. No need to go through Text.

Data.ByteString.Char8.unpack converts the byte with value n to the Unicode code point with value n. Don't use it on (non-ASCII) UTF-8 encoded text!

Edit: I see you say you want to convert the result of encode to a String to write it to a file. Don't do that, use the IO functions like Data.ByteString.Lazy.writeFile instead.

Upvotes: 3

Related Questions