Reputation: 9154

UTF-8 response with servlet

I am reading HTTP response from a Perl page in a Servlet like this:

public String getHTML(String urlToRead) {
        URL url;
        HttpURLConnection conn;
        BufferedReader rd;
        String line;
        String result = "";
        try {
           url = new URL(urlToRead);
           conn = (HttpURLConnection) url.openConnection();
           conn.setRequestMethod("GET");
           conn.setRequestProperty("Accept-Charset", "UTF-8");
           conn.setRequestProperty("Content-Type", "text/xml; charset=UTF-8");

           rd = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
           while ((line = rd.readLine()) != null) {
              byte [] b = line.getBytes();
              result += new String(b, "UTF-8");
           }
           rd.close();
        } catch (Exception e) {
           e.printStackTrace();
        }
        return result;
   }

I am displaying this result with this code:

response.setContentType("text/plain; charset=UTF-8");

        PrintWriter out = new PrintWriter(new OutputStreamWriter(response.getOutputStream(), "UTF-8"), true);


        try {

            String query = request.getParameter("query");
            String type = request.getParameter("type");

            String res = getHTML(url);
            out.write(res);

        } finally {            
            out.close();
        }

But the response still is not encoded as UTF-8. What am I doing wrong?

Thanks in advance.

Upvotes: 2

Answers (4)

Eduardo Briguenti Vieira

Reputation: 4599

In my case, I have do add another configuration.

Previously, I was writing the page this way:

try (PrintStream printStream = new PrintStream(response.getOutputStream()) {
        printStream.print(pageInjecting);
}

I changed to:

try (PrintStream printStream = new PrintStream(response.getOutputStream(), false, "UTF-8")) {
        printStream.print(pageInjecting);
}

Upvotes: 0

Muhammad Nuruddin

Reputation: 1

I also faced the same problem in another scenario, but just do it I believe it will work:

byte[] b = line.getBytes(UTF8_CHARSET);

in the while loop:

while ((line = rd.readLine()) != null) {
          byte [] b = line.getBytes();  // NOT UTF-8
          result += new String(b, "UTF-8");
       }

Upvotes: 0

laz

Reputation: 28648

That call to line.getBytes() looks suspicious. You should probably make it line.getBytes("UTF-8") if you are certain that what is returned is UTF-8 encoded. Additionally, I'm not sure why it is even necessary. A typical approach to getting data out of a BufferedReader is to use a StringBuilder to continue appending each String retrieved from readLine into a result. The conversion back and forth between String and byte[] is unnecessary.

Change result into a StringBuilder and do this:

while ((line = rd.readLine()) != null) {
    result.append(line);
}

Upvotes: 3

forty-two

Reputation: 12817

Here is where you break the chain of character encoding conversions:

       while ((line = rd.readLine()) != null) {
          byte [] b = line.getBytes();  // NOT UTF-8
          result += new String(b, "UTF-8");
       }

From String#getBytes() javadoc:

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array

And, defaullt charset is probably not UTF-8.

But why do all the conversions in the first place? Just read the raw bytes from the source and write the raw bytes to the consumer. It's supposed to be UTF-8 all the way.

Upvotes: 2

UTF-8 response with servlet

Answers (4)

Related Questions