Reputation: 337
This is a section from Dive Into Python 3 regarding strings:
In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in utf-8, or a Python string encoded as CP-1252. “Is this string utf-8?” is an invalid question. utf-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.
Earlier today I used the hashlib
module and read the help text for md5
that says:
Return a new MD5 hash object; optionally initialized with a string.
Well, it doesn't accept a string
- it accepts a bytes
object.
Maybe I'm reading too much into this, but wouldn't it make more sense if the help text stated a bytes
should be used instead? Or are people using the same name for strings and bytes?
Upvotes: 5
Views: 535
Reputation: 304365
Probably the help is left over from Python2.
This is one of the bigger changes from 2 to 3
Python2 Python3 str bytes unicode str
Python2.6+ starts to prepare for the change by making bytes
a synonym of str
You should report it to the developers (Unless it has already been fixed - I only have 3.1.2 here). I think the wording should probably be improved
Upvotes: 5
Reputation: 288130
In Python 2 and 3, str
was used both for strings of characters as well as bytes. In Fact, until Python 2.6, there wasn't even a bytes
type (and in 2.6 and 2.7, bytes is str
).
The mentioned inconsistencies in the hashlib documentation are an artifact of this history.
Upvotes: 6