Reputation: 11
I am trying to apply a xOr operation to a number of files, some of which are very large.
Basically i am getting a file and xor-ing it byte by byte (or at least this is what i think i'm doing). When it hits a larger file (around 70MB) i get an out of memory error and my script crashes.
My computer has 16GB of Ram with more than 50% of it available so i would not relate this to my hardware.
def xor3(source_file, target_file):
b = bytearray(open(source_file, 'rb').read())
for i in range(len(b)):
b[i] ^= 0x71
open(target_file, 'wb').write(b)
I tried to read the file in chunks, but it seems i'm too unexperimented for this as the output is not the desired one. The first function returns what i want, of course :)
def xor(data):
b = bytearray(data)
for i in range(len(b)):
b[i] ^= 0x41
return data
def xor4(source_file, target_file):
with open(source_file,'rb') as ifile:
with open(target_file, 'w+b') as ofile:
data = ifile.read(1024*1024)
while data:
ofile.write(xor(data))
data = ifile.read(1024*1024)
What is the appropiate solution for this kind of operation ? What is it that i am doing wrong ?
Upvotes: 1
Views: 3573
Reputation: 5945
This probably only works in python 2, which shows again that it's much nicer to use for bytestreams:
def xor(infile, outfile, val=0x71, chunk=1024):
with open(infile, 'r') as inf:
with open(outfile, 'w') as outf:
c = inf.read(chunk)
while c != '':
s = "".join([chr(ord(cc) ^val) for cc in c])
outf.write(s)
c = inf.read(chunk)
Upvotes: 0
Reputation: 17506
Iterate lazily over the large file.
from operator import xor
from functools import partial
def chunked(file, chunk_size):
return iter(lambda: file.read(chunk_size), b'')
myoperation = partial(xor, 0x71)
with open(source_file, 'rb') as source, open(target_file, 'ab') as target:
processed = (map(myoperation, bytearray(data)) for data in chunked(source, 65536))
for data in processed:
target.write(bytearray(data))
Upvotes: 0
Reputation: 2543
Unless I am mistaken, in your second example, you create a copy of data
by calling bytearray
and assigning it to b
. Then you modify b
, but return data
.
The modification in b
has no effect on data
itself.
Upvotes: 0
Reputation: 13581
use seek
function to get the file in chunks and append it every time to output file
CHUNK_SIZE = 1000 #for example
with open(source_file, 'rb') as source:
with open(target_file, 'a') as target:
bytes = bytearray(source.read(CHUNK_SIZE))
source.seek(CHUNK_SIZE)
for i in range(len(bytes)):
bytes[i] ^= 0x71
target.write(bytes)
Upvotes: 3