Reputation: 3510
Please consider the following code. The purpose here is to create a typical ascii dump of binary data, substituting non-printable characters with '.'
.
Both segments of code print the same output. The second seems more "Pythonic", but is (in my measurements) 2-3 times slower than the first, presumably because it creates more temporary objects.
Since I'm going to be doing this many millions of times, performance matters. Is there a faster Pythonic way to do this?
ba = bytearray.fromhex("01610262") # non-printable binary data, 'a', binary, 'b'
for i in range(len(ba)):
if not chr(ba[i]).isprintable():
ba[i] = ord('.')
text = ba.decode("ascii")
print(text) # prints ".a.b"
ba = bytearray.fromhex("01610262")
text = bytes(map(lambda b: ord('.') if not chr(b).isprintable() else b, ba)).decode("ascii")
print(text) # prints ".a.b"
Version with performance measurement:
ITERATIONS=1000000
start = time.time()
for z in range(ITERATIONS):
ba = bytearray.fromhex("01610262")
for i in range(0, len(ba)):
if not chr(ba[i]).isprintable():
ba[i] = ord('.')
text = ba.decode("ascii")
#print(text)
end = time.time()
print("first elapsed time:", (end-start))
start = time.time()
for z in range(ITERATIONS):
ba = bytearray.fromhex("01610262")
text = bytes(map(lambda b: ord('.') if not chr(b).isprintable() else b, ba)).decode("ascii")
#print(text)
end = time.time()
print("second elapsed time:", (end-start))
Outputs:
first elapsed time: 2.4349358081817627
second elapsed time: 5.805044889450073
UPDATE: Discovered that running the timing test from the command line, instead of my IDE (which was attaching a debugger) both increased the performance, and radically reduced the performance differential.
The accepted answer (using a translation table) is by far fastest when run from the command line.
Marat's ''.join() solution in the comments falls between the first & second options above when run from command line (but is much worse under my debugger [using Visual Studio Code Python Extensions].)
Upvotes: 0
Views: 81
Reputation: 181
If you can be bothered setting up a translation table then I got much better results using that method on your example.
btrans = bytes.maketrans(b'\x01\x02',b'..')
for z in range(ITERATIONS):
ba = bytearray.fromhex("01610262")
text = ba.translate(btrans).decode('ascii')
#print(text)
end = time.time()
print("third elapsed time:", (end-start))
first elapsed time: 1.4424219131469727
second elapsed time: 1.1425127983093262
third elapsed time: 0.3709402084350586
Upvotes: 2