Reputation: 1571
I'm using jsoup to read this the following page:
http://valencia.loquo.com/cs/vivienda/piso-en-alquiler/312
Using the following code:
Document doc = Jsoup.connect("http://valencia.loquo.com/cs/vivienda/piso-en-alquiler/312").get();
and I get this error:
java.nio.charset.UnsupportedCharsetException: ISO-LATIN-1
I inspected the HTML response header:
Status Code: 200
Date: Sun, 23 Oct 2011 20:10:02 GMT
Content-Encoding: gzip
X-Pad: avoid browser bug
Connection: Keep-Alive
Content-Length: 13890
Server: Apache/2.2.3 (Debian)
Vary: Accept-Encoding
Content-Type: text/html; charset=iso-latin-1
Keep-Alive: timeout=5, max=100
As you can see the HTML response says charset=iso-latin-1 probably that is why I get the error. Anyway I can see the HTML body reponse. There is any way to avoid this error and getting the document (with the standard charset)?
Thanks in advance for your help
Danilo
Upvotes: 1
Views: 1267
Reputation: 298838
You can always download the document without JSoup, convert the encoding programmatically (here's a link to the cookbook) and pass the converted String to JSoup.
Upvotes: 1
Reputation: 168825
See ISO_8859_1..
ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
Upvotes: 1