Reputation: 24207
I've wrote a small cryptographic module in python whose task is to cipher a file and put the result in a tarfile. The original file to encrypt can be quit large, but that's not a problem because my program only need to work with a small block of data at a time, that can be encrypted on the fly and stored.
I'm looking for a way to avoid doing it in two passes, first writing all the data in a temporary file then inserting result in a tarfile.
Basically I do the following (where generator_encryptor is a simple generator that yield chunks of data read from sourcefile). :
t = tarfile.open("target.tar", "w")
tmp = file('content', 'wb')
for chunk in generator_encryptor("sourcefile"):
tmp.write(chunks)
tmp.close()
t.add(content)
t.close()
I'm a bit annoyed having to use a temporary file as I file it should be easy to write blocs directly in the tar file, but collecting every chunks in a single string and using something like t.addfile('content', StringIO(bigcipheredstring) seems excluded because I can't guarantee that I have memory enough to old bigcipheredstring.
Any hint of how to do that ?
Upvotes: 3
Views: 4296
Reputation: 49843
You can create an own file-like object and pass to TarFile.addfile
. Your file-like object will generate the encrypted contents on the fly in the fileobj.read()
method.
Upvotes: 4
Reputation: 24207
Basically using a file-like object and passing it to TarFile.addfile do the trick, but there is still some issues open.
The resulting code is below, basically I had to write a wrapper class that transform my existing generator into a file-like object. I also added the GeneratorEncrypto class in my example to make code compleat. You can notice it has a len method that returns the length of the written file (but understand it's just a dummy placeholder that does nothing usefull).
import tarfile
class GeneratorEncryptor(object):
"""Dummy class for testing purpose
The real one perform on the fly encryption of source file
"""
def __init__(self, source):
self.source = source
self.BLOCKSIZE = 1024
self.NBBLOCKS = 1000
def __call__(self):
for c in range(0, self.NBBLOCKS):
yield self.BLOCKSIZE * str(c%10)
def __len__(self):
return self.BLOCKSIZE * self.NBBLOCKS
class GeneratorToFile(object):
"""Transform a data generator into a conventional file handle
"""
def __init__(self, generator):
self.buf = ''
self.generator = generator()
def read(self, size):
chunk = self.buf
while len(chunk) < size:
try:
chunk = chunk + self.generator.next()
except StopIteration:
self.buf = ''
return chunk
self.buf = chunk[size:]
return chunk[:size]
t = tarfile.open("target.tar", "w")
tmp = file('content', 'wb')
generator = GeneratorEncryptor("source")
ti = t.gettarinfo(name = "content")
ti.size = len(generator)
t.addfile(ti, fileobj = GeneratorToFile(generator))
t.close()
Upvotes: 2
Reputation: 400149
Huh? Can't you just use the subprocess module to run a pipe through to tar? That way, no temporary file should be needed. Of course, this won't work if you can't generate your data in small enough chunks to fit in RAM, but if you have that problem, then tar isn't the issue.
Upvotes: 2
Reputation: 56382
I guess you need to understand how the tar format works, and handle the tar writing yourself. Maybe this can be helpful?
http://mail.python.org/pipermail/python-list/2001-August/100796.html
Upvotes: 1