user3111358
user3111358

Reputation: 311

Using GZIP Module with Python

I'm trying to use the Python GZIP module to simply uncompress several .gz files in a directory. Note that I do not want to read the files, only uncompress them. After searching this site for a while, I have this code segment, but it does not work:

import gzip
import glob
import os
for file in glob.glob(PATH_TO_FILE + "/*.gz"):
    #print file
    if os.path.isdir(file) == False:
        shutil.copy(file, FILE_DIR)
        # uncompress the file
        inF = gzip.open(file, 'rb')
        s = inF.read()
        inF.close()

the .gz files are in the correct location, and I can print the full path + filename with the print command, but the GZIP module isn't getting executed properly. what am I missing?

Upvotes: 24

Views: 46867

Answers (5)

loopbackbee
loopbackbee

Reputation: 23342

If you get no error, the gzip module probably is being executed properly, and the file is already getting decompressed.

The precise definition of "decompressed" varies on context:

I do not want to read the files, only uncompress them

The gzip module doesn't work as a desktop archiving program like 7-zip - you can't "uncompress" a file without "reading" it. Note that "reading" (in programming) usually just means "storing (temporarily) in the computer RAM", not "opening the file in the GUI".

What you probably mean by "uncompress" (as in a desktop archiving program) is more precisely described (in programming) as "read a in-memory stream/buffer from a compressed file, and write it to a new file (and possibly delete the compressed file afterwards)"

inF = gzip.open(file, 'rb')
s = inF.read()
inF.close()

With these lines, you're just reading the stream. If you expect a new "uncompressed" file to be created, you just need to write the buffer to a new file:

with open(out_filename, 'wb') as out_file:
    out_file.write(s)

If you're dealing with very large files (larger than the amount of your RAM), you'll need to adopt a different approach. But that is the topic for another question.

Upvotes: 40

Dalupus
Dalupus

Reputation: 1120

I think there is a much simpler solution than the others presented given the op only wanted to extract all the files in a directory:

import glob
from setuptools import archive_util

for fn in glob.glob('*.gz'):
  archive_util.unpack_archive(fn, '.')

Upvotes: 0

Martin Thoma
Martin Thoma

Reputation: 136845

You should use with to open files and, of course, store the result of reading the compressed file. See gzip documentation:

import gzip
import glob
import os
import os.path

for gzip_path in glob.glob("%s/*.gz" % PATH_TO_FILE):
    if not os.path.isdir(gzip_path):
        with gzip.open(gzip_path, 'rb') as in_file:
            s = in_file.read()

        # Now store the uncompressed data
        path_to_store = gzip_fname[:-3]  # remove the '.gz' from the filename

        # store uncompressed file data from 's' variable
        with open(path_to_store, 'w') as f:
            f.write(s)

Depending on what exactly you want to do, you might want to have a look at tarfile and its 'r:gz' option for opening files.

Upvotes: 6

user3111358
user3111358

Reputation: 311

I was able to resolve this issue by using the subprocess module:

for file in glob.glob(PATH_TO_FILE + "/*.gz"):
    if os.path.isdir(file) == False:
        shutil.copy(file, FILE_DIR)
        # uncompress the file
        subprocess.call(["gunzip", FILE_DIR + "/" + os.path.basename(file)])

Since my goal was to simply uncompress the archive, the above code accomplishes this. The archived files are located in a central location, and are copied to a working area, uncompressed, and used in a test case. the GZIP module was too complicated for what I was trying to accomplish.

Thanks for everyone's help. It is much appreciated!

Upvotes: 4

Jan Spurny
Jan Spurny

Reputation: 5537

You're decompressing file into s variable, and do nothing with it. You should stop searching stackoverflow and read at least python tutorial. Seriously.

Anyway, there's several thing wrong with your code:

  1. you need is to STORE the unzipped data in s into some file.

  2. there's no need to copy the actual *.gz files. Because in your code, you're unpacking the original gzip file and not the copy.

  3. you're using file, which is a reserved word, as a variable. This is not an error, just a very bad practice.

This should probably do what you wanted:

import gzip
import glob
import os
import os.path

for gzip_path in glob.glob(PATH_TO_FILE + "/*.gz"):
    if os.path.isdir(gzip_path) == False:
        inF = gzip.open(gzip_path, 'rb')
        # uncompress the gzip_path INTO THE 's' variable
        s = inF.read()
        inF.close()

        # get gzip filename (without directories)
        gzip_fname = os.path.basename(gzip_path)
        # get original filename (remove 3 characters from the end: ".gz")
        fname = gzip_fname[:-3]
        uncompressed_path = os.path.join(FILE_DIR, fname)

        # store uncompressed file data from 's' variable
        open(uncompressed_path, 'w').write(s)

Upvotes: 6

Related Questions