Why are my search results not in the same charset as my page encoding?

Question

I am using UTF-8 encoding for an html page.

In the debugger console, document.characterSet returns "UTF-8".

On the page, I have metadata (keywords, description, title) with a valid UTF-8 character: '®', which is UTF-8: 'c2ae'

The character displays correctly in the view source, and on the page title.

But google search results and bing search results are showing it as 'Ã®'. That is, during the web crawl, it appears to be getting converted to ISO-8859-1 or Western-1252 displaying both bytes: 'c2' and 'ae'.

If I replace the character with ® => (\u00ae) it shows correctly.

Short of converting my meta data to ISO-8859-1, is there a best practice I should be using for this?

Tim · Accepted Answer

Issue was on the back-end, the data was not being transcoded to UTF-8 properly when read from cache. So, I feel the best practice is to use the native UTF-8 BMP character, with the proper page encoding, and not be required to use html entity values.

Why are my search results not in the same charset as my page encoding?

Answers (2)

Related Questions