jmelvin
jmelvin

Reputation: 657

Special character 'Â' inserted before copyright symbol

Our source code contains a copyright at the top of every CSS file...

/* Copyright © ... */

Every time CSS files are loaded by the Firefox Style Editor, a special character is inserted before the copyright symbol...

/* Copyright © ... */

It adds an additional special character each time the file is loaded. I do not believe this is limited to Firefox, but that's what I use at the moment for CSS dynamic styling. It's annoying to have to delete this char every time and occasionally it gets into commits and pushed.

Question: How can the special character insertion be prevented?

Upvotes: 20

Views: 18793

Answers (4)

pyknight202
pyknight202

Reputation: 1487

As many of the answers have taught us, the browser doesn't know the encoding of the CSS file, and so the copyright symbol doesn't appear as we would like it to.

If we explicitly define the encoding of the CSS file, we can fix this. Set the first line of your CSS file to the following in order to specify UTF-8 encoding:

@charset "utf-8";

Now a © symbol within any of our CSS comments will display correctly.

Upvotes: 0

Matthias Schulz
Matthias Schulz

Reputation: 51

Check if you have set a correct charset-meta-tag in your html head

<meta charSet="UTF-8"/>

Upvotes: 5

user10294571
user10294571

Reputation:

Instead of using copyright symbol itself, try to use its numerical number:

&#169;

Upvotes: 11

R. Schreurs
R. Schreurs

Reputation: 9085

My advice is to open the files in Notepad++ and check the detected encoding, as displayed under the Encoding menu. I expect that it will read:

Encode in UTF-8

If so, apply Convert to UTF-8-BOM. It will prepend 3 magic bytes to your text file, making the UTF-8 encoding explicit. Save the files and see if it works.

Explanation

The reason for this  to appear, is that some tool is not detecting the encoding correctly and assumes it is ANSI (a.k.a. Windows-1252) or ISO 8859-1. Those one-bytes encodings and UTF-8 are very much alike for normal English texts and code files. The standard ASCII set is encoded in exactly the same way. Only special characters, like in your case, the copyright symbol, are encoded differently, using two, three of four bytes, rather than one.

Now, the copyright symbol has bytes 0xC2 0xA9 or 11000010 10101001 in UTF-8 encoding, and byte 0xA9 in ANSI encoding.

The latin capital letter A with circumflex has byte 0xC2 or 11000010 in ANSI encoding.

When 11000010 10101001 is encountered and interpreted as UTF-8, the first three bits, of the first byte, 110, in combination with the first two bits of the second byte , 10, indicate the start of a two-byte UTF-8 character. So this is the correct UTF-8 encoding of the copyright symbol.

If, however, 11000010 10101001 is encountered and interpreted as ANSI, two separate characters are seen, Â and ©.

I think it is no coincidence that the second byte of the UTF-8 encoding of © is the same as the one-byte ANSI encoding. It looks like the Latin-1 supplement is inserted in UTF-8 at exactly the same order as it has in ANSI and with the same offset, leaving the second bytes equal. E.g. a UTF-8 encoded

µ

would show up as

µ

if wrongly interpreted as ANSI.

Maybe, this was done to preserve some information about the original characters, if an encoding error were made.

Upvotes: 14

Related Questions