roqvist
roqvist

Reputation: 337

Are Python's bytes objects also known as strings?

This is a section from Dive Into Python 3 regarding strings:

In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in utf-8, or a Python string encoded as CP-1252. “Is this string utf-8?” is an invalid question. utf-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.

Earlier today I used the hashlib module and read the help text for md5 that says:

Return a new MD5 hash object; optionally initialized with a string.

Well, it doesn't accept a string - it accepts a bytes object.

Maybe I'm reading too much into this, but wouldn't it make more sense if the help text stated a bytes should be used instead? Or are people using the same name for strings and bytes?

Upvotes: 5

Views: 535

Answers (2)

John La Rooy
John La Rooy

Reputation: 304365

Probably the help is left over from Python2.

This is one of the bigger changes from 2 to 3

    Python2          Python3

    str              bytes
    unicode          str

Python2.6+ starts to prepare for the change by making bytes a synonym of str

You should report it to the developers (Unless it has already been fixed - I only have 3.1.2 here). I think the wording should probably be improved

Upvotes: 5

phihag
phihag

Reputation: 288130

In Python 2 and 3, str was used both for strings of characters as well as bytes. In Fact, until Python 2.6, there wasn't even a bytes type (and in 2.6 and 2.7, bytes is str).

The mentioned inconsistencies in the hashlib documentation are an artifact of this history.

Upvotes: 6

Related Questions