nyxneha
nyxneha

Reputation: 11

Junk Characters being added to string when reading HTTP request parameters

I have an html form :

<p> Select beer characteristics </p>
<p> 
  Color: 
  <select name="color" size="1">
    <option value="light"> light </option>
    <option value="amber"> amber </option>
    <option value="brown"> brown </option>
    <option value="dark"> dark </option>
  </select>
  <br><br> 
</p>
<input type = "submit" value="submit">

Any suggestions?

Upvotes: 1

Views: 3921

Answers (4)

Shreyash Samani
Shreyash Samani

Reputation: 48

I faced the same problem while converting xhtml to PDF using wkhtmltopdf tool.

Adding <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> in my HTML template resolved the issue.

Upvotes: 0

biziclop
biziclop

Reputation: 49814

You're using the wrong kind of quote characters in your HTML code.

What you probably have is something like this:

<option value=“light“>

Unless you use the correct double quotes (") or single quote (') to enclose an attribute, the browser will interpret the value as “light“ and not light, and that's what it sends to the server.

(Note that this wouldn't be valid in XHTML, where only quoted attributes are allowed, but in plain HTML specifying attributes in a <foo bar=value> format works.)

The strange output can be explained by the fact that your browser and your server use different encodings: one uses ISO-8859-1 and the other UTF-8. The UTF-8 sequence for the left double quotation mark character is 0xe2 0x80 0x9c, which when read with ISO-8859-1, gives exactly the two characters you mention. (The third one falls in an unused block and is dropped silently.)

This is a separate problem that needs to be remedied too, see the other answers for tips to deal with it.

Upvotes: 2

J4v4
J4v4

Reputation: 810

I am quite sure that this is related to character encoding or URL encoding mismatches.

First of all, make sure to specify a charset

<form action="..." method="..." accept-charset="UTF-8">
    <select ...> ... </select>
</form>

If the client sends all your stuff correctly with a good encoding (UTF-8), you have to configure your server side to read the data as well.

I don't know what you're using, but one method is:

URLDecoder.decode(formParams, "UTF-8");

To be sure, you can add an encoding to your HTML file as well:

<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    ...
</head>

Edit: make sure to send and receive all the stuff correctly as well.

Sending HTML file from Server:

1) Make sure to set this:
Content-Type: text/html; charset=UTF-8

If you're sending a file, make sure to save your file using the UTF-8 encoding. If your HTML is a generated String, use:

PrintWriter writer = new PrintWriter(new OutputStreamWriter(httpOutputStream, "UTF-8"));
writer.print(string);
...

The URL from the request is received in US-ASCII encoding:

String urlEncodedString = new String(receivedBytes, "UTF-8");
String decoded = URLDecoder.decode(urlEncodedString, "UTF-8");

Upvotes: 2

Marcin Szawurski
Marcin Szawurski

Reputation: 1333

This is result of wrong encoding in browser, which is most probably not set on response. You can try to use:

response.setContentType("text/html; charset=UTF-8");

Upvotes: 2

Related Questions