Reputation: 77029
I have a Python2 codebase that makes extensive use of str
to store raw binary data. I want to support both Python2 and Python3.
The bytes
(an alis of str
) type in Python2 and bytes
in Python3 are completely different. They take different arguments to construct, index to different types and have different str
and repr
.
What's the best way of unifying the code for both Python versions, using a single type to store raw data?
Upvotes: 7
Views: 1778
Reputation: 51979
The python-future
package has a backport of the Python3 bytes type.
>>> from builtins import bytes # in py2, this picks up the backport
>>> b = bytes(b'ABCD')
This provides the Python 3 interface in both Python 2 and Python 3. In Python 3, it is the builtin bytes
type. In Python 2, it is a compatibility layer on top of the str
type.
Upvotes: 3
Reputation: 288060
Assuming you only need to support Python 2.6 and newer, you can simply use bytes
for, well, bytes. Use b
literals to create bytes objects, such as b'\x0a\x0b\x00'
. When working with files, make sure the mode includes a b
(as in open('file.bin', 'rb')
).
Beware that iteration and element access is different though. In these cases, you can write your code to use chunks. Instead of b[0] == 0
(Python 3) or b[0] == b'\x00'
(Python 2) write b[0:1] == b'\x00'
. Other options is using bytearray
(when the bytes are mutable) or helper functions.
Strings of characters should be unicode
in Python 2, independent from Python 3 porting; otherwise the code would likely be wrong when encountering non-ASCII characters anyways. The equivalent is str
in Python 3.
Either use u
literals to create character strings (such as u'Düsseldorf'
) and/or make sure to start every file with from __future__ import unicode_literals
. Declare file encodings when necessary by starting files with # encoding: utf-8
.
Use io.open
to read character strings from files. For network code, fetch bytes and call decode
on them to get a character string.
If you need to support Python 2.5 or 3.2, have a look at six to convert literals.
Add plenty of assertions to make sure you that functions which operate on character strings don't get bytes, and vice versa. As usual, a good test suite with 100% coverage helps a lot.
Upvotes: 0
Reputation: 3993
If your project small and simple use six.
Otherwise I suggest to have two independent codebases: one for Python 2 and one for Python 3. Initially it may sound like a lot of unnecessary work, but eventually it's actually a lot easier to maintain.
As an example of what your project may become if you decide to support both pythons in a single codebase, take a look at google's protobuf. Lots of often counterintuitive branching all round the code, abstractions that were modified just to allow hacks. And as your project will evolve it won't get better: deadlines play against quality of the code.
With two separate codebases you will simply apply almost identical patches which isn't a lot of work compared to what is ahead of you if you want a single code base. And it will be easier to migrate to Python 3 completely once number of Python 2 users of your package drop.
Upvotes: 0
Reputation: 271
I don't know on what parts you want to work with bytes, I allmost allways work with bytearray's, and this is how I do it when reading from a file
with open(file, 'rb') as imageFile:
f = imageFile.read()
b = bytearray(f)
I took that right out of a project I am working on, and it works in both 2 and 3. Maybe something for you to look at?
Upvotes: 0