Reputation: 4435
I'm having trouble parsing utf8 characters into Text
when deriving a Read
instance. For example, when I run the following in ghci...
> import Data.Text
> data Message = Message Text deriving (Read, Show)
> read ("Message \"→\"") :: Message
Message "\8594"
Can I do anything to keep my text inside Message
utf-8 encoded? I.e. The result should be...
Message "→"
(P.S. I already receive my serialized messages as Text
, but currently need to unpack
to a String
in order to call read
. I'd love to avoid this...)
EDIT: Ah sorry, answers rightly point out that it's show
not read
which converts to "\8594"
- is there a way to show
and convert back to Text
again without the backslash encoding?
Upvotes: 2
Views: 2947
Reputation: 77374
To the best of my knowledge, the internal encoding used by Text
(which is actually UTF-16) is consistent and not exposed directly. If you want UTF-8, you can decode/encode a Text
value as appropriate. Similarly, it doesn't make sense to talk about an encoding for String
, because that's just a list of Char
, where each Char
is a unicode code point.
Most likely, it's only the Show
instance for Text
displaying things differently here.
Also, keep in mind that (by consistent convention in standard libraries) read
and show
are expected to behave as (de-)serialization functions, with a "serialized" format that, interpreted as a Haskell expression, describes a value equivalent to the one being (de-)serialized. As such, the slash encoding with ASCII text is often preferred for being widely supported and unambiguous. If you want to display a Text
value with the actual code points, show
isn't what you want.
I'm not entirely clear on what you want to do with the Text
--using show
directly is exactly what you're trying to avoid. If you want to display text in a terminal window that's going to dictate the encoding, and you want the stuff defined in Data.Text.IO
. If you need to convert to a specific encoding for whatever other reason, Data.Text.Encoding
will give you an encoded ByteString
(emphasis on "byte", not "string"--a ByteString
is a sequence of raw bytes, not a string of characters).
If you just want to convert from Text
to String
and back to Text
... what's wrong with the slash encoding? show
is not really intended for pretty-printing output for users to read, despite many people's initial expectations otherwise.
Upvotes: 5