Reputation: 311
I'm trying to use the Python GZIP module to simply uncompress several .gz files in a directory. Note that I do not want to read the files, only uncompress them. After searching this site for a while, I have this code segment, but it does not work:
import gzip
import glob
import os
for file in glob.glob(PATH_TO_FILE + "/*.gz"):
#print file
if os.path.isdir(file) == False:
shutil.copy(file, FILE_DIR)
# uncompress the file
inF = gzip.open(file, 'rb')
s = inF.read()
inF.close()
the .gz files are in the correct location, and I can print the full path + filename with the print command, but the GZIP module isn't getting executed properly. what am I missing?
Upvotes: 24
Views: 46867
Reputation: 23342
If you get no error, the gzip module probably is being executed properly, and the file is already getting decompressed.
The precise definition of "decompressed" varies on context:
I do not want to read the files, only uncompress them
The gzip
module doesn't work as a desktop archiving program like 7-zip - you can't "uncompress" a file without "reading" it. Note that "reading" (in programming) usually just means "storing (temporarily) in the computer RAM", not "opening the file in the GUI".
What you probably mean by "uncompress" (as in a desktop archiving program) is more precisely described (in programming) as "read a in-memory stream/buffer from a compressed file, and write it to a new file (and possibly delete the compressed file afterwards)"
inF = gzip.open(file, 'rb')
s = inF.read()
inF.close()
With these lines, you're just reading the stream. If you expect a new "uncompressed" file to be created, you just need to write the buffer to a new file:
with open(out_filename, 'wb') as out_file:
out_file.write(s)
If you're dealing with very large files (larger than the amount of your RAM), you'll need to adopt a different approach. But that is the topic for another question.
Upvotes: 40
Reputation: 1120
I think there is a much simpler solution than the others presented given the op only wanted to extract all the files in a directory:
import glob
from setuptools import archive_util
for fn in glob.glob('*.gz'):
archive_util.unpack_archive(fn, '.')
Upvotes: 0
Reputation: 136845
You should use with
to open files and, of course, store the result of reading the compressed file. See gzip
documentation:
import gzip
import glob
import os
import os.path
for gzip_path in glob.glob("%s/*.gz" % PATH_TO_FILE):
if not os.path.isdir(gzip_path):
with gzip.open(gzip_path, 'rb') as in_file:
s = in_file.read()
# Now store the uncompressed data
path_to_store = gzip_fname[:-3] # remove the '.gz' from the filename
# store uncompressed file data from 's' variable
with open(path_to_store, 'w') as f:
f.write(s)
Depending on what exactly you want to do, you might want to have a look at tarfile
and its 'r:gz'
option for opening files.
Upvotes: 6
Reputation: 311
I was able to resolve this issue by using the subprocess module:
for file in glob.glob(PATH_TO_FILE + "/*.gz"):
if os.path.isdir(file) == False:
shutil.copy(file, FILE_DIR)
# uncompress the file
subprocess.call(["gunzip", FILE_DIR + "/" + os.path.basename(file)])
Since my goal was to simply uncompress the archive, the above code accomplishes this. The archived files are located in a central location, and are copied to a working area, uncompressed, and used in a test case. the GZIP module was too complicated for what I was trying to accomplish.
Thanks for everyone's help. It is much appreciated!
Upvotes: 4
Reputation: 5537
You're decompressing file into s
variable, and do nothing with it. You should stop searching stackoverflow and read at least python tutorial. Seriously.
Anyway, there's several thing wrong with your code:
you need is to STORE the unzipped data in s
into some file.
there's no need to copy the actual *.gz
files. Because in your code, you're unpacking the original gzip file and not the copy.
you're using file
, which is a reserved word, as a variable. This is not
an error, just a very bad practice.
This should probably do what you wanted:
import gzip
import glob
import os
import os.path
for gzip_path in glob.glob(PATH_TO_FILE + "/*.gz"):
if os.path.isdir(gzip_path) == False:
inF = gzip.open(gzip_path, 'rb')
# uncompress the gzip_path INTO THE 's' variable
s = inF.read()
inF.close()
# get gzip filename (without directories)
gzip_fname = os.path.basename(gzip_path)
# get original filename (remove 3 characters from the end: ".gz")
fname = gzip_fname[:-3]
uncompressed_path = os.path.join(FILE_DIR, fname)
# store uncompressed file data from 's' variable
open(uncompressed_path, 'w').write(s)
Upvotes: 6