imei
imei

Reputation: 31

How to set data encoding read by Jsoup?

I try to use post data in Big5 and get the like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="zh-TW">

The java statement is like:

Document docs = Jsoup.connect(param)
                     .timeout(30000)
                     .postDataCharset("Big5")
                     .data("syear","104")
                     .data("smonth","6")
                     .data("sday","30")
                     .data("eyear","104")
                     .data("emonth","7")
                     .data("eday","17")
                     .data("SectNO", "不限科別")
                     .data("EmpNO", "不限醫生")
                     .post();

How to set charset for sending data to get response?

Upvotes: 1

Views: 953

Answers (1)

Stephan
Stephan

Reputation: 43053

Explication

As of Jsoup 1.8.3, postDataCharset() sets the charset of data posted. This charset isn't reused when it comes to parse the data read.

Instead, Jsoup tries to find somehow a meta http-equiv specifying the charset. If it can't find, it assumes by default that the charset is UTF-8. In your case, this assumption is wrong.

Workaround

To workaround this, don't let Jsoup guess the data encoding for you. Here is how to do it:

// Let Jsoup fetch the data
Response res = Jsoup.connect(param)         //
                 .timeout(30000)            //
                 .postDataCharset("Big5")   //
                 .data("syear", "104")      //
                 .data("smonth", "6")       //
                 .data("sday", "30")        //
                 .data("eyear", "104")      //
                 .data("emonth", "7")       //
                 .data("eday", "17")        //
                 .data("SectNO", "不限科別") //
                 .data("EmpNO", "不限醫生")  //
                 .execute();

// Now, we tell it explicitly which encoding to use
Document docs = Jsoup.parse(
                 new String(res.bodyAsBytes(), "Big5"), //
                 param //
);

Upvotes: 0

Related Questions