Tim
Tim

Reputation: 1174

Why are my search results not in the same charset as my page encoding?

I am using UTF-8 encoding for an html page.

<head>
   <meta charset="utf-8">

In the debugger console, document.characterSet returns "UTF-8".

On the page, I have metadata (keywords, description, title) with a valid UTF-8 character: '®', which is UTF-8: 'c2ae'

The character displays correctly in the view source, and on the page title.

But google search results and bing search results are showing it as 'î'. That is, during the web crawl, it appears to be getting converted to ISO-8859-1 or Western-1252 displaying both bytes: 'c2' and 'ae'.

If I replace the character with &#174; => (\u00ae) it shows correctly.

Short of converting my meta data to ISO-8859-1, is there a best practice I should be using for this?

Upvotes: 1

Views: 538

Answers (2)

Tim
Tim

Reputation: 1174

Issue was on the back-end, the data was not being transcoded to UTF-8 properly when read from cache. So, I feel the best practice is to use the native UTF-8 BMP character, with the proper page encoding, and not be required to use html entity values.

Upvotes: 1

Wayne
Wayne

Reputation: 3519

Look at the pages meta tags and confirm that it is not using this:

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

For HTML5 Google recommends:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">

Also note this: enter image description here

Note:

<meta charset="">

Another Note: Some characters are reserved in HTML. "Html Entities" These reserved characters in HTML must be replaced with character entities. e.g.

&   ampersand   &amp;   &#38;

®   registered trademark    &reg;   &#174;

Upvotes: 0

Related Questions