ketar
ketar

Reputation: 188

Unzip gz files within folders in a main folder using python

I have .gz zipped files within multiple folders that are all within a main folder called "usa". I was able to extract an individual file using the code below.

import gzip
import shutil
source=r"C:\usauc300.dbf.gz"
output=r"C:\usauc300.dbf"
with gzip.open(source,"rb") as f_in, open(output,"wb") as f_out:
    shutil.copyfileobj(f_in, f_out)

I have searched high and low but can't find an equivalent to the command line option gzip -dr..... which means "decompress recursive" and will go through each folder and extract the contents to the same location while deleting the original zipped file. Does anyone know how I can use python to loop through folders within a folder, find any zipped files and unzip them to the same location while replacing the unzipped file with the zipped one?

Upvotes: 4

Views: 16383

Answers (4)

Rajnish Dubey
Rajnish Dubey

Reputation: 1

Below is a code to extract multiple files in a folder and replace those with unzipped files. This works for .gz files

import os, gzip, shutil
dir_name = r'C:\Users\Desktop\log file working\New folder'
def gz_extract(directory):
    extension = ".gz"
    os.chdir(directory)
    for item in os.listdir(directory): # loop through items in dir
      if item.endswith(extension): # check for ".gz" extension
          gz_name = os.path.abspath(item) # get full path of files
          file_name = (os.path.basename(gz_name)).rsplit('.',1)[0] #get file name for file within
          with gzip.open(gz_name,"rb") as f_in, open(file_name,"wb") as f_out:
              shutil.copyfileobj(f_in, f_out)
          os.remove(gz_name) # delete zipped file      
gz_extract(dir_name)

Upvotes: 0

djvg
djvg

Reputation: 14255

It may not answer this specific question, but for those looking to extract a gzipped directory structure: that would be a job for shutil.unpack_archive.

For example:

import shutil

shutil.unpack_archive(
    filename='path/to/archive.tar.gz', extract_dir='where/to/extract/to'
)

Upvotes: 3

Mehmet Kazanç
Mehmet Kazanç

Reputation: 181

You can use this format too.

import tarfile, glob
base_dir = '/home/user/pipelines/data_files/'
    
for name in glob.glob(base_dir + '*.gz'):
     print(name)
     tf = tarfile.open(name)
     tf.extractall(base_dir + 'unzipped_files/')
     print('-- Done') 

Upvotes: 2

Addy
Addy

Reputation: 731

I believe that's because gzip never operates over directories, it acts as a compression algorithm unlike zip and tar where we could compress directories. python's implementation of gzip is to operate on files. However recursive traversal of a directory tree is easy if we look at the os.walk call.

(I haven't tested this)

def gunzip(file_path,output_path):
    with gzip.open(file_path,"rb") as f_in, open(output_path,"wb") as f_out:
        shutil.copyfileobj(f_in, f_out)

def recurse_and_gunzip(root):
    walker = os.walk(root)
    for root,dirs,files in walker:
        for f in files:
            if fnmatch.fnmatch(f,"*.gz"):
                gunzip(f,f.replace(".gz",""))

Upvotes: 5

Related Questions