Reputation: 243
I have a "bytes" object and an "int" mask, I want to do a xor over all the bytes with my mask. I do this action repeatedly over big "bytes" objects (~ 4096 KB).
This is the code I have which does the work well, only it is very CPU intensive and slows down my script:
# 'data' is bytes and 'mask' is int
bmask = struct.pack('!I', mask) # converting the "int" mask to "bytes" of 4 bytes
a = bytes(b ^ m for b, m in zip(data, itertools.cycle(bmask)))
The best I could come up with is this, which is about 20 times faster:
# 'data' is bytes and 'mask' is int
# reversing the bytes of the mask
bmask = struct.pack("<I", mask)
mask = struct.unpack(">I", bmask)[0]
# converting from bytes to array of "int"s
arr = array.array("I", data)
# looping over the "int"s
for i in range(len(arr)):
arr[i] ^= mask
# must return bytes
a = bytes(arr)
My questions are:
P.S. if it is of any importance, I'm using Python 3.5
Upvotes: 10
Views: 2580
Reputation: 8370
An alternative in case you don't want to use numpy. The advantage comes from making a single comparison, while extending the mask size to the needed (depending on the datasize).
@timed
def do_mask_int(data, mask):
intdata = int.from_bytes(data, byteorder='little', signed=False)
strmask = format(mask,'0x')
strmask = strmask * ((intdata.bit_length() + 31) // 32)
n = intdata ^ int(strmask, 16)
return n.to_bytes(((n.bit_length() + 7) // 8), 'little') or b'\0'
results are as it follows:
make_data: 8.288754 seconds
do_mask_arr1: 0.258530 seconds
do_mask_arr2: 0.253095 seconds
True
do_mask_numpy: 0.010309 seconds
True
do_mask_int: 0.060408 seconds
True
Still credits to numpy for being faster, but maybe one doesn't want to include it in production environment.
:] Best
Upvotes: 2
Reputation: 55499
I don't think you can get much faster than your algorithm, using pure Python. (But Fabio Veronese's answer shows that's not true). You can shave off a tiny bit of time by doing the looping in a list comprehension, but then that list needs to be converted back into an array, and the array has to be converted to bytes, so it uses more RAM for a negligible benefit.
However, you can make this much faster by using Numpy. Here's a short demo.
from time import perf_counter
from random import randrange, seed
import array
import numpy as np
seed(42)
def timed(func):
''' Timing decorator '''
def wrapped(*args):
start = perf_counter()
result = func(*args)
stop = perf_counter()
print('{}: {:.6f} seconds'.format(func.__name__, stop - start))
return result
wrapped.__name__ = func.__name__
wrapped.__doc__ = func.__doc__
return wrapped
@timed
def do_mask_arr1(data, mask):
arr = array.array("I", data)
# looping over the "int"s
for i in range(len(arr)):
arr[i] ^= mask
return arr.tobytes()
@timed
def do_mask_arr2(data, mask):
arr = array.array("I", data)
return array.array("I", [u ^ mask for u in arr]).tobytes()
@timed
def do_mask_numpy(data, mask):
return (np.fromstring(data, dtype=np.uint32) ^ mask).tobytes()
@timed
def make_data(datasize):
''' Make some random bytes '''
return bytes(randrange(256) for _ in range(datasize))
datasize = 100000
mask = 0x12345678
data = make_data(datasize)
d1 = do_mask_arr1(data, mask)
d2 = do_mask_arr2(data, mask)
print(d1 == d2)
d3 = do_mask_numpy(data, mask)
print(d1 == d3)
typical output
make_data: 0.751557 seconds
do_mask_arr1: 0.026865 seconds
do_mask_arr2: 0.025110 seconds
True
do_mask_numpy: 0.000438 seconds
True
Tested using Python 3.6.0 on an old single core 32 bit 2GHz machine running on Linux.
I just did a run with datasize = 4000000
and do_mask_numpy
took 0.0422 seconds.
Upvotes: 3