Vlad M
Vlad M

Reputation: 11

xor-ing a large file in python

I am trying to apply a xOr operation to a number of files, some of which are very large.
Basically i am getting a file and xor-ing it byte by byte (or at least this is what i think i'm doing). When it hits a larger file (around 70MB) i get an out of memory error and my script crashes.
My computer has 16GB of Ram with more than 50% of it available so i would not relate this to my hardware.

def xor3(source_file, target_file):
    b = bytearray(open(source_file, 'rb').read())
    for i in range(len(b)):
        b[i] ^= 0x71
    open(target_file, 'wb').write(b)

I tried to read the file in chunks, but it seems i'm too unexperimented for this as the output is not the desired one. The first function returns what i want, of course :)

def xor(data):
    b = bytearray(data)
    for i in range(len(b)):
        b[i] ^= 0x41
    return data


def xor4(source_file, target_file):
    with open(source_file,'rb') as ifile:
        with open(target_file, 'w+b') as ofile:
            data = ifile.read(1024*1024)
            while data:
                ofile.write(xor(data))
                data = ifile.read(1024*1024)


What is the appropiate solution for this kind of operation ? What is it that i am doing wrong ?

Upvotes: 1

Views: 3573

Answers (4)

Jan Christoph Terasa
Jan Christoph Terasa

Reputation: 5945

This probably only works in python 2, which shows again that it's much nicer to use for bytestreams:

def xor(infile, outfile, val=0x71, chunk=1024):
    with open(infile, 'r') as inf:
        with open(outfile, 'w') as outf:
            c = inf.read(chunk)
            while c != '':
                s = "".join([chr(ord(cc) ^val) for cc in c])
                outf.write(s)
                c = inf.read(chunk)

Upvotes: 0

Sebastian Wozny
Sebastian Wozny

Reputation: 17506

Iterate lazily over the large file.

from operator import xor
from functools import partial
def chunked(file, chunk_size):
    return iter(lambda: file.read(chunk_size), b'')
myoperation = partial(xor, 0x71)

with open(source_file, 'rb') as source, open(target_file, 'ab') as target:
    processed = (map(myoperation, bytearray(data)) for data in chunked(source, 65536))
    for data in processed:
        target.write(bytearray(data))

Upvotes: 0

data
data

Reputation: 2543

Unless I am mistaken, in your second example, you create a copy of data by calling bytearray and assigning it to b. Then you modify b, but return data. The modification in b has no effect on data itself.

Upvotes: 0

m.antkowicz
m.antkowicz

Reputation: 13581

use seek function to get the file in chunks and append it every time to output file

CHUNK_SIZE = 1000 #for example

with open(source_file, 'rb') as source:
    with open(target_file, 'a') as target:
        bytes = bytearray(source.read(CHUNK_SIZE))
        source.seek(CHUNK_SIZE)

        for i in range(len(bytes)):
            bytes[i] ^= 0x71

        target.write(bytes)

Upvotes: 3

Related Questions