Ben Muircroft
Ben Muircroft

Reputation: 3034

Apparently Some non standard Characters are seen as regular Characters

I make an intentional error using a character that seems nonstandard but is availiable to use:

var ᛨ={};
ᛨ.causeError()

Uncaught TypeError: è.causeError is not a function

Apparently ᛨ character is a version of è character

(a utf-8 normal text character a,b,c is text)

vs

(non text ☎,®,෴,%)

è === http://unicode-table.com/en/00E8/

Encoding      hex             dec (bytes)    dec            binary
UTF-8         C3 A8           195 168        50088          11000011 10101000
UTF-16BE      00 E8           0 232          232            00000000 11101000
UTF-16LE      E8 00           232 0          59392          11101000 00000000
UTF-32BE      00 00 00 E8     0 0 0 232      232            00000000 00000000 00000000 11101000
UTF-32LE      E8 00 00 00     232 0 0 0      3892314112     11101000 00000000 00000000 00000000

ᛨ === http://unicode-table.com/en/16E8/

Encoding      hex            dec (bytes)     dec            binary
UTF-8         E1 9B A8       225 155 168     14785448       11100001 10011011 10101000
UTF-16BE      16 E8          22 232          5864           00010110 11101000
UTF-16LE      E8 16          232 22          59414          11101000 00010110
UTF-32BE      00 00 16 E8    0 0 22 232      5864           00000000 00000000 00010110 11101000
UTF-32LE      E8 16 00 00    232 22 0 0      3893755904     11101000 00010110 00000000 00000000

I don't see the correlation!

How can I test non-standard characters to see if they have a correlation with a normal text character?

What is the relation I would look for?

Out of interest; Is this Unicode issue documented anywhere?

[This question, after further thought, isn't completely solved (see comments)]

Upvotes: 5

Views: 171

Answers (2)

GolezTrol
GolezTrol

Reputation: 116100

è 'coincidentally' is character E8 in western ANSI encodings, which is also the second byte of the UTF-16 code point for your special character (and `è in UTF-16 too, by the way).

If you are working from a source file: You may have saved your file in the wrong encoding, maybe in ANSI, probably in UTF-16. Make sure your source file is saved in the right encoding. The 'right' encoding can be just about anything (although UTF-8 is recommended), as long as it matches the Content-Encoding headers you send with your file and it can contain every character you want to put in it.

If you are working from the console: If it's just the console messing up, this still explains the issue. Internally, the browser will probably use a different encoding than UTF-8, because UTF-8 is efficient for transmission, but not convenient to work with. Most likely it uses UTF-16 (or UCS2). Your character would then be encoded in a double byte code point 16 E8. If the console tries to display each byte as separate characters, it will show E8 as 'è' and skip 16 altogether, since it is historically an ASCII control character (SYN, for Synchronous Idle) and not intended for display at all.

Upvotes: 1

Domino
Domino

Reputation: 6768

It tested with a variable called ಠ_ಠ, the error message contains " _ " (space underscore space). It looks like the code that writes error messages doesn't support as many characters as it should.

The same issue happens in the console, so it's not a file encoding problem. Plus the characters are managed without any issue EXCEPT in automated error messages. Even writing throw new Error("ಠ_ಠ"); works without a problem.

That seems like a rather specific bug, but it affects both Chrome and Firefox.

Upvotes: 1

Related Questions