Reputation: 157
I'm reading an HTML web page that contains literal accented words (Spanish):
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Web page</title>
<body>
<p>Título</p>
<p>Año</p>
<p>Ángel</p>
<p>¿por qué nos vamos?</p>
</body>
I'm using HXT:
...
let doc = readDocument [ withValidate no
, withInputEncoding iso8859_1
, withParseHTML yes
, withWarnings no
, withEncodingErrors no
, withCurl []] url
...
Using the option
withInputEncoding utf8
discard those chars, getting as result the following words: Ttulo, Ao, ngel, por qu nos vamos? Using the option
withInputEncoding iso8859_1
convert those chars to strings, getting as result words like: Rom\225ntica, Man\180s, H\233ctor. Where \225, \180 or \233 are strings, not chars.
What is the best method/way/approach to properly manage this situation in HXT and get all words without modifications?
Thanks.
Upvotes: 2
Views: 155
Reputation: 13876
I bet you already have everything you need
Prelude> putStrLn $ read "\"Rom\225ntica\""
Romántica
Looks like you are looking to result of show
applied to the string, not the string itself? Note that print
uses show
:
Prelude> print (read "\"Rom\225ntica\"" :: String)
"Rom\225ntica"
Upvotes: 6