Reputation: 63
I am trying to compress a virtual machine file with size 300GB.
Every single time the python script is killed because the actually memory usage of
the gzip
module exceeds 30GB (virtual memory).
Is there any way to achieve large file(300GB to 64TB) compression using python?
def gzipFile(fileName):
startTime = time.time()
with open(fileName,'rb') as fileHandle:
compressedFileName = "%s-1.gz" % fileName
with gzip.open(compressedFileName, 'wb') as compressedFH:
compressedFH.writelines(fileHandle)
finalTime = time.time() - startTime
print("gzipFile=%s fileName=%s" % (finalTime,compressFileName))
Upvotes: 2
Views: 967
Reputation: 91149
with gzip.open(compressedFileName, 'wb') as compressedFH:
compressedFH.writelines(fileHandle)
writes the file fileHandle
line by line, i. e. splits it into chunks separated by the \n
character.
While it is quite probable that this character occurs from time to time in a binary file as well, this is not guaranteed.
It might be better to do
with gzip.open(compressedFileName, 'wb') as compressedFH:
while True:
chunk = fileHandle.read(65536)
if not chunk: break # the while loop
compressedFH.write(chunk)
or, as tqzf writes in a comment,
with gzip.open(compressedFileName, 'wb') as compressedFH:
shutil.copyfileobj(fileHandle, compressedFileName)
Upvotes: 3
Reputation: 8447
from subprocess import call
call(["tar", "-pczf name_of_your_archive.tar.gz /path/to/directory"])
Run it externally, simplest way and probably fastest.
Upvotes: 2