Reputation: 3
I'm currently trying to tackle a question from my assignment which asks why similar looking strings are checked to be non-identical.
The question is stated as below:
In a computer program’s code, two string variables are declared. When their respective values are printed by the program onto the computer screen, both appear as the string "ĝ" . However, the program returns false when both variables are checked for their string equivalence (i.e. false means both strings are considered non-identical).
What could be the most likely cause of these seemingly contradictory results? Assume that the UTF-8 encoding is used by the computer program.
The question expects to give a reason on why such contradictory result occurred and how the UTF-8 encoding works in this scenario
My current bet is that there is another character that looks similar to "ĝ" but has a different unicode representation but I'm not entirely sure about it as well.
Upvotes: 0
Views: 124
Reputation: 177725
Unicode has combining characters, so you could have:
U+011D LATIN SMALL LETTER G WITH CIRCUMFLEX
or:
U+0067 LATIN SMALL LETTER G
U+0302 COMBINING CIRCUMFLEX ACCENT
Visually these will print the same (Python code example):
>>> print('\u011d \u0067\u0302')
ĝ ĝ
FYI, in UTF-8 encoding, that would be hexadecimal bytes C4 9D
vs. 67 CC 82
.
Upvotes: 3