tryanderror
tryanderror

Reputation: 173

Fastest way to swap bytes of big file in Python

For a project I need to swap 4 byte words in a fast way. I need to switch every word(4Bytes) of a big file(2mb) before I can use a other calculation algorithm.

def word_swaper(data):
    buf_swaped_data = b""

    number_of_words = int(len(data) / 4)

    for word in range(number_of_words):
        newword = data[word*4:(word+1)*4]
        newword = newword[::-1]
        buf_swaped_data += newword

Is there a faster or more simpler way? I'm going to use this for files with a size about 2mb and so the calculating time is about 1-2 minutes, which is way to long.

Upvotes: 1

Views: 601

Answers (1)

AKX
AKX

Reputation: 169052

Using two io.BytesIO()s benchmarks to be more than 3x as fast on my box but there's a built-in method for this that's 550 times faster...

import timeit
import os
import io
import array


def original(data):
    buf_swaped_data = b""

    number_of_words = int(len(data) / 4)

    for word in range(number_of_words):
        newword = data[word * 4 : (word + 1) * 4]
        newword = newword[::-1]
        buf_swaped_data += newword
    return buf_swaped_data


def io_pair(data):
    in_io = io.BytesIO(data)
    out_io = io.BytesIO()
    while True:
        word = in_io.read(4)
        if not word:
            break
        out_io.write(word[::-1])
    return out_io.getvalue()


def array_swap(data):
    arr = array.array("L", data)
    arr.byteswap()
    return bytes(arr)


def t(f):
    data = b"1234" * 8000
    assert f(data) == original(data)
    count, time_taken = timeit.Timer(lambda: f(data)).autorange()
    print(f.__name__, count / time_taken)


t(original)
t(io_pair)
t(array_swap)
original      186.465
io_pair       568.180
array_swap 102897.423

Upvotes: 2

Related Questions