Richard
Richard

Reputation: 63

How to compress 300GB file using python

I am trying to compress a virtual machine file with size 300GB.

Every single time the python script is killed because the actually memory usage of the gzip module exceeds 30GB (virtual memory).

Is there any way to achieve large file(300GB to 64TB) compression using python?

def gzipFile(fileName):
  startTime = time.time()
  with  open(fileName,'rb') as fileHandle:
     compressedFileName = "%s-1.gz" % fileName
     with gzip.open(compressedFileName, 'wb') as compressedFH:
        compressedFH.writelines(fileHandle)

  finalTime = time.time() - startTime
  print("gzipFile=%s fileName=%s" % (finalTime,compressFileName))

Upvotes: 2

Views: 967

Answers (2)

glglgl
glglgl

Reputation: 91149

with gzip.open(compressedFileName, 'wb') as compressedFH:
    compressedFH.writelines(fileHandle)

writes the file fileHandle line by line, i. e. splits it into chunks separated by the \n character.

While it is quite probable that this character occurs from time to time in a binary file as well, this is not guaranteed.

It might be better to do

with gzip.open(compressedFileName, 'wb') as compressedFH:
    while True:
        chunk = fileHandle.read(65536)
        if not chunk: break # the while loop
        compressedFH.write(chunk)

or, as tqzf writes in a comment,

with gzip.open(compressedFileName, 'wb') as compressedFH:
    shutil.copyfileobj(fileHandle, compressedFileName)

Upvotes: 3

Migol
Migol

Reputation: 8447

from subprocess import call
call(["tar", "-pczf name_of_your_archive.tar.gz /path/to/directory"])

Run it externally, simplest way and probably fastest.

Upvotes: 2

Related Questions