Reputation: 1873
When I read this Python2's official page on Unicode, it says
Under the hood, Python represents Unicode strings as either 16-or 32-bit integers, depending on how the Python interpreter was compiled.
What does above sentence mean? Could it mean that Python2 has its own special encodings of Unicode? If so, why not just use UTF-8?
Upvotes: 1
Views: 108
Reputation: 17415
This statement simply means that there is underlying C code that uses both these encodings and that depending on the circumstances, either variant is chosen. Those circumstances are typically user choice, compiler and operating system.
Now, for the possible rationale for that, there are reasons not to use UTF-8:
wchar_t
is actually a 16-bit type, so if you wanted to use a 32-bit type there you would have to reimplement all kinds of functions for your self-invented character type. Dropping support for anything above the Unicode BMP or leaking surrogate sequences into the Python API is a reasonable compromise then (but one that sticks unfortunately).Note that those are possible reasons, I don't claim that these apply to Python's implementation.
Upvotes: 4