Python3 surprising behavior of identifier being a non-ASCII Unicode character

Question

Following code runs without an assertion error:

K = 'K'
𝕂 = '𝕂'
𝚱 = '𝚱'
𝔎 = '𝔎'
𝕶 = '𝕶'
𝓚 = '𝓚'
ᴷ = 'ᴷ'
assert K == 𝕂 == 𝔎 == 𝕶 == 𝓚 == ᴷ
print(f'{K=}, {𝕂=}, {𝚱=}, {𝕶=}, {𝔎=}, {𝓚=}')

and prints

K='ᴷ', 𝕂='ᴷ', 𝚱='𝚱', 𝕶='ᴷ', 𝔎='ᴷ', 𝓚='ᴷ'

I am aware of https://peps.python.org/pep-3131/ and have read the Python documentation about identifiers https://docs.python.org/3/reference/lexical_analysis.html#identifiers but haven't found any hints explaining the experienced behavior.

So my question is: What is wrong with my expectation that the value of all of the other optical apparently different identifier doesn't change if a new value is assigned to one of them?

UPDATE: taking currently available comments and answers into account raises the need to explain more about what I expect as satisfying answer to my question:

The hint about NFKC conversion behind the comparison of names of identifiers helps to understand how it comes that the experienced behavior is there, but ... it leaves me still with the question opened what is the deep reason behind the choice to have different approaches for comparison of Unicode strings depending on context in which they occur?

The way strings as string literals are compared to each other apparently differs from the way same strings are compared if they specify names of identifiers.

What am I still missing to know about to be able to see the deep reason behind the why it was decided that Unicode strings representing names of identifiers in Python are not compared the same way to each other as Unicode strings representing string literals?

If I understand it right Unicode comes with the possibility to have ambiguous specifications for the same expected outcome using either one code point representing a complex character or multiple code points with an appropriate base character plus its modifiers. Normalization of the Unicode string is then an attempt on the way to resolve the mess caused by introducing the possibility of this ambiguity in first place. But this is the Unicode specific stuff having in my eyes the heaviest impact on Unicode visualization tools like viewer and editors. What a programming language using representation of a string as a list of integer values (Unicode code points) larger than 255 actually implements is another thing, isn't it?

Below some further attempts to find a better wording for the question I seek to get answered:

What is the advantage of creating the possibility that two different Unicode strings are eventually considered not to be different if they are used as names of Python identifiers?

What is the actual feature behind what I am considering to be a not making sense behavior because of broken WYSIWYG ability?

Below some more code illustrating what is going on and demonstrating the difference in comparison between string literals and identifier names originated in same strings as the strings literals:

from unicodedata import normalize as normal
itisasitisRepr = [                char       for char in ['K', '𝕂', '𝚱', '𝔎', '𝕶', '𝓚', 'ᴷ']]
hexintasisRepr = [         f'{ord(char):5X}' for char in itisasitisRepr]
normalizedRepr = [ normal('NFKC', char)      for char in itisasitisRepr]
hexintnormRepr = [         f'{ord(char):5X}' for char in normalizedRepr]
print(itisasitisRepr)
print(hexintasisRepr)
print(normalizedRepr)
print(hexintnormRepr)
print(f"{              'K' ==              '𝕂'  = }")
print(f"{normal('NFKC','K')==normal('NFKC','𝕂') = }")
print(ᴷ == 𝓚, 'ᴷ' == '𝓚') # gives: True, False

gives:

['K', '𝕂', '𝚱', '𝔎', '𝕶', '𝓚', 'ᴷ']
['   4B', '1D542', '1D6B1', '1D50E', '1D576', '1D4DA', ' 1D37']
['K', 'K', 'Κ', 'K', 'K', 'K', 'K']
['   4B', '   4B', '  39A', '   4B', '   4B', '   4B', '   4B']
              'K' ==              '𝕂'  = False
normal('NFKC','K')==normal('NFKC','𝕂') = True

Python3 surprising behavior of identifier being a non-ASCII Unicode character

Answers (1)

Related Questions