Brad
Brad

Reputation: 3510

Fast Pythonic way to convert ByteArray to string with in-place substitutions

Please consider the following code. The purpose here is to create a typical ascii dump of binary data, substituting non-printable characters with '.'.

Both segments of code print the same output. The second seems more "Pythonic", but is (in my measurements) 2-3 times slower than the first, presumably because it creates more temporary objects.

Since I'm going to be doing this many millions of times, performance matters. Is there a faster Pythonic way to do this?

ba = bytearray.fromhex("01610262") # non-printable binary data, 'a', binary, 'b'
for i in range(len(ba)):
    if not chr(ba[i]).isprintable():
        ba[i] =  ord('.')
text = ba.decode("ascii")
print(text)  # prints ".a.b"

ba = bytearray.fromhex("01610262")
text = bytes(map(lambda b: ord('.') if not chr(b).isprintable() else b, ba)).decode("ascii")
print(text)  # prints ".a.b"

Version with performance measurement:

ITERATIONS=1000000

start = time.time()
for z in range(ITERATIONS):
    ba = bytearray.fromhex("01610262")
    for i in range(0, len(ba)):
        if not chr(ba[i]).isprintable():
            ba[i] =  ord('.')
    text = ba.decode("ascii")
    #print(text)
end = time.time()
print("first elapsed time:", (end-start))

start = time.time()
for z in range(ITERATIONS):
    ba = bytearray.fromhex("01610262")
    text = bytes(map(lambda b: ord('.') if not chr(b).isprintable() else b, ba)).decode("ascii")
    #print(text)
end = time.time()
print("second elapsed time:", (end-start))

Outputs:

first elapsed time: 2.4349358081817627
second elapsed time: 5.805044889450073

UPDATE: Discovered that running the timing test from the command line, instead of my IDE (which was attaching a debugger) both increased the performance, and radically reduced the performance differential.

The accepted answer (using a translation table) is by far fastest when run from the command line.

Marat's ''.join() solution in the comments falls between the first & second options above when run from command line (but is much worse under my debugger [using Visual Studio Code Python Extensions].)

Upvotes: 0

Views: 81

Answers (1)

Jonathan
Jonathan

Reputation: 181

If you can be bothered setting up a translation table then I got much better results using that method on your example.

btrans = bytes.maketrans(b'\x01\x02',b'..')
for z in range(ITERATIONS):
    ba = bytearray.fromhex("01610262")
    text = ba.translate(btrans).decode('ascii')
    #print(text)
end = time.time()
print("third elapsed time:", (end-start))

first elapsed time: 1.4424219131469727
second elapsed time: 1.1425127983093262
third elapsed time: 0.3709402084350586

Upvotes: 2

Related Questions