Reputation: 5230
Edit: I'm talking about behavior in Python 2.7.
The chr
function converts integers between 0 and 127 into the ASCII characters. E.g.
>>> chr(65)
'A'
I get how this is useful in certain situations and I understand why it covers 0..127, the 7-bit ASCII range.
The function also takes arguments from 128..255. For these numbers, it simply returns the hexadecimal representation of the argument. In this range, different bytes mean different things depending on which part of the ISO-8859 standard is used.
I'd understand if chr
took another argument, e.g.
>>> chr(228, encoding='iso-8859-1') # hypothetical
'ä'
However, there is no such option:
chr(i) -> character
Return a string of one character with ordinal i; 0 <= i < 256.
My questions is: What is the point of raising ValueError
for i > 255
instead of i > 127
? All the function does for 128 <= i < 256
is return hex values?
Upvotes: 10
Views: 19969
Reputation: 184345
In Python 2.x, a str
is a sequence of bytes, so chr()
returns a string of one byte and accepts values in the range 0-255, as this is the range that can be represented by a byte. When you print the repr()
of a string with a byte in the range 128-255, the character is printed in escape format because there is no standard way to represent such characters (ASCII defines only 0-127). You can convert it to Unicode using unicode()
however, and specify the source encoding:
unicode(chr(200), encoding="latin1")
In Python 3.x, str
is a sequence of Unicode characters and chr()
takes a much larger range. Bytes are handled by the bytes
type.
Upvotes: 11
Reputation: 8030
Note that python 2 string handling is broken. It's one of the reasons I recommend switching to python 3.
In python 2, the string type was designed to represent both text and binary strings. So, chr() is used to convert an integer to a byte. It's not really related to text, or ASCII, or ISO-8859-1. It's a binary stream of bytes:
binary_command = chr(100) + chr(200) + chr(10)
device.write(binary_command)
etc()
In python 2.7, the bytes() type was added for forward compatibility with python 3 and it maps to str().
Upvotes: 0
Reputation: 122486
I see what you're saying but it isn't correct. In Python 3.4 chr
is documented as:
Return the string representing a character whose Unicode codepoint is the integer i.
And here are some examples:
>>> chr(15000)
'㪘'
>>> chr(5000)
'ᎈ'
In Python 2.x it was:
Return a string of one character whose ASCII code is the integer i.
The function chr
has been around for a long time in Python and I think the understanding of various encodings only developed in recent releases. In that sense it makes sense to support the basic ASCII table and return hex values for the extended ASCII set within the 128 - 255 range.
Even within Unicode the ASCII set is only defined as 128 characters, not 256, so there isn't (wasn't) a standard and accepted way of letting ord()
return an answer for those input values.
Upvotes: 0