salezica
salezica

Reputation: 77029

Supporting python 2 and 3: str, bytes or alternative

I have a Python2 codebase that makes extensive use of str to store raw binary data. I want to support both Python2 and Python3.

The bytes (an alis of str) type in Python2 and bytes in Python3 are completely different. They take different arguments to construct, index to different types and have different str and repr.

What's the best way of unifying the code for both Python versions, using a single type to store raw data?

Upvotes: 7

Views: 1778

Answers (4)

MisterMiyagi
MisterMiyagi

Reputation: 51979

The python-future package has a backport of the Python3 bytes type.

>>> from builtins import bytes  # in py2, this picks up the backport
>>> b = bytes(b'ABCD')

This provides the Python 3 interface in both Python 2 and Python 3. In Python 3, it is the builtin bytes type. In Python 2, it is a compatibility layer on top of the str type.

Upvotes: 3

phihag
phihag

Reputation: 288060

Assuming you only need to support Python 2.6 and newer, you can simply use bytes for, well, bytes. Use b literals to create bytes objects, such as b'\x0a\x0b\x00'. When working with files, make sure the mode includes a b (as in open('file.bin', 'rb')).
Beware that iteration and element access is different though. In these cases, you can write your code to use chunks. Instead of b[0] == 0 (Python 3) or b[0] == b'\x00' (Python 2) write b[0:1] == b'\x00'. Other options is using bytearray (when the bytes are mutable) or helper functions.

Strings of characters should be unicode in Python 2, independent from Python 3 porting; otherwise the code would likely be wrong when encountering non-ASCII characters anyways. The equivalent is str in Python 3.
Either use u literals to create character strings (such as u'Düsseldorf') and/or make sure to start every file with from __future__ import unicode_literals. Declare file encodings when necessary by starting files with # encoding: utf-8.
Use io.open to read character strings from files. For network code, fetch bytes and call decode on them to get a character string.

If you need to support Python 2.5 or 3.2, have a look at six to convert literals.

Add plenty of assertions to make sure you that functions which operate on character strings don't get bytes, and vice versa. As usual, a good test suite with 100% coverage helps a lot.

Upvotes: 0

Kentzo
Kentzo

Reputation: 3993

If your project small and simple use six.

Otherwise I suggest to have two independent codebases: one for Python 2 and one for Python 3. Initially it may sound like a lot of unnecessary work, but eventually it's actually a lot easier to maintain.

As an example of what your project may become if you decide to support both pythons in a single codebase, take a look at google's protobuf. Lots of often counterintuitive branching all round the code, abstractions that were modified just to allow hacks. And as your project will evolve it won't get better: deadlines play against quality of the code.

With two separate codebases you will simply apply almost identical patches which isn't a lot of work compared to what is ahead of you if you want a single code base. And it will be easier to migrate to Python 3 completely once number of Python 2 users of your package drop.

Upvotes: 0

Faling Dutchman
Faling Dutchman

Reputation: 271

I don't know on what parts you want to work with bytes, I allmost allways work with bytearray's, and this is how I do it when reading from a file

with open(file, 'rb') as imageFile:
    f = imageFile.read()
    b = bytearray(f)

I took that right out of a project I am working on, and it works in both 2 and 3. Maybe something for you to look at?

Upvotes: 0

Related Questions