Reputation: 3882
I have problems with character encoding in haskell. This simple program write wrong results. What I am really interested here is encode function which forces me to use ByteString. Application is:
import Data.ByteString.Char8 (unpack, pack)
import Data.ByteString.Lazy (toStrict)
import Data.Csv (encode) -- cabal install cassava
main = do
-- (middle character is polish diacritic letter)
putStrLn $ unpack $ pack "aća"
putStrLn $ unpack $ toStrict $ encode ["aća"]
It should print
aća
a,ć,a
but instead it writes
aa
a,Ä,a
This breaks my application encoding CSV. This happen on Linux no matter of my locale settings
$ locale
LANG=pl_PL.UTF-8
LC_CTYPE="pl_PL.UTF-8"
LC_NUMERIC="pl_PL.UTF-8"
LC_TIME="pl_PL.UTF-8"
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY="pl_PL.UTF-8"
LC_MESSAGES="pl_PL.UTF-8"
LC_PAPER="pl_PL.UTF-8"
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT="pl_PL.UTF-8"
LC_IDENTIFICATION="pl_PL.UTF-8"
LC_ALL=pl_PL.UTF-8
or
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
What I want to know is how to convert output of encode (Data.ByteString.Lazy.ByteString) to String so I can write it to file using e.g. writeFile function.
Upvotes: 0
Views: 928
Reputation: 14999
You should simply use Data.ByteString.Lazy.putStr
rather than putStrLn . unpack . toStrict
. No need to go through Text
.
Data.ByteString.Char8.unpack
converts the byte with value n to the Unicode code point with value n. Don't use it on (non-ASCII) UTF-8 encoded text!
Edit: I see you say you want to convert the result of encode
to a String to write it to a file. Don't do that, use the IO functions like Data.ByteString.Lazy.writeFile
instead.
Upvotes: 3