Reputation: 3034
I make an intentional error using a character that seems nonstandard but is availiable to use:
var ᛨ={};
ᛨ.causeError()
Uncaught TypeError: è.causeError is not a function
Apparently ᛨ character is a version of è character
(a utf-8 normal text character a,b,c is text)
vs
(non text ☎,®,෴,%)
è === http://unicode-table.com/en/00E8/
Encoding hex dec (bytes) dec binary
UTF-8 C3 A8 195 168 50088 11000011 10101000
UTF-16BE 00 E8 0 232 232 00000000 11101000
UTF-16LE E8 00 232 0 59392 11101000 00000000
UTF-32BE 00 00 00 E8 0 0 0 232 232 00000000 00000000 00000000 11101000
UTF-32LE E8 00 00 00 232 0 0 0 3892314112 11101000 00000000 00000000 00000000
ᛨ === http://unicode-table.com/en/16E8/
Encoding hex dec (bytes) dec binary
UTF-8 E1 9B A8 225 155 168 14785448 11100001 10011011 10101000
UTF-16BE 16 E8 22 232 5864 00010110 11101000
UTF-16LE E8 16 232 22 59414 11101000 00010110
UTF-32BE 00 00 16 E8 0 0 22 232 5864 00000000 00000000 00010110 11101000
UTF-32LE E8 16 00 00 232 22 0 0 3893755904 11101000 00010110 00000000 00000000
I don't see the correlation!
How can I test non-standard characters to see if they have a correlation with a normal text character?
What is the relation I would look for?
Out of interest; Is this Unicode issue documented anywhere?
[This question, after further thought, isn't completely solved (see comments)]
Upvotes: 5
Views: 171
Reputation: 116100
è
'coincidentally' is character E8
in western ANSI encodings, which is also the second byte of the UTF-16 code point for your special character (and `è in UTF-16 too, by the way).
If you are working from a source file: You may have saved your file in the wrong encoding, maybe in ANSI, probably in UTF-16. Make sure your source file is saved in the right encoding. The 'right' encoding can be just about anything (although UTF-8 is recommended), as long as it matches the Content-Encoding headers you send with your file and it can contain every character you want to put in it.
If you are working from the console: If it's just the console messing up, this still explains the issue. Internally, the browser will probably use a different encoding than UTF-8, because UTF-8 is efficient for transmission, but not convenient to work with. Most likely it uses UTF-16 (or UCS2). Your character would then be encoded in a double byte code point 16 E8
. If the console tries to display each byte as separate characters, it will show E8
as 'è'
and skip 16
altogether, since it is historically an ASCII control character (SYN, for Synchronous Idle) and not intended for display at all.
Upvotes: 1
Reputation: 6768
It tested with a variable called ಠ_ಠ, the error message contains " _ " (space underscore space). It looks like the code that writes error messages doesn't support as many characters as it should.
The same issue happens in the console, so it's not a file encoding problem. Plus the characters are managed without any issue EXCEPT in automated error messages. Even writing throw new Error("ಠ_ಠ");
works without a problem.
That seems like a rather specific bug, but it affects both Chrome and Firefox.
Upvotes: 1