Where is the Â (C2) coming from

Question

For some reason a piece of code replaces spaces with \u00A0 - i.e. a Non-breaking space. This code is then used to sanitize a URL (yes I know that is very bad - in many ways). Strangely, when these are displayed in my test jsp a rogue Â appears - why?

Sample JSP to demonstrate the issue.

<%@page contentType="text/html" pageEncoding="UTF-8"%>


  
    
    JSP Page
    <%
      String[] parameters = request.getParameterValues("p");
      if (parameters == null || parameters.length == 0) {
        parameters = new String[]{""};
      }
    %>
  
  
    Hello World!
    A Link
    <%=parameters[0]%>

Why is the parameter showing as HelloÂ there? Where is the c2 coming from?

Added

BTW: The hex of the parameter is 48 65 6c 6c 6f c2 a0 74 68 65 72 65 showing the c2 in-situ.

Erwin Smout · Accepted Answer

Rogue Â appearing is most often an indication that something got encoded using UTF-8, and then decoded back again using a "traditional" code-page character set, e.g. ISO-8859-1, or CP850, or ...

Where is the Â (C2) coming from

Answers (2)

Related Questions

Where is the &#194; (C2) coming from

Answers (2)

Related Questions

Where is the Â (C2) coming from