ChuNan
ChuNan

Reputation: 1141

How to read text files in a zipped folder in Python

I have a compressed data file (all in a folder, then zipped). I want to read each file without unzipping. I tried several methods but nothing works for entering the folder in the zip file. How should I achieve that?

Without folder in the zip file:

with zipfile.ZipFile('data.zip') as z:
  for filename in z.namelist():
     data = filename.readlines()

With one folder:

with zipfile.ZipFile('data.zip') as z:
      for filename in z.namelist():
         if filename.endswith('/'):
             # Here is what I was stucked

Upvotes: 23

Views: 48432

Answers (4)

Joe P
Joe P

Reputation: 485

I got grofte's code to work. I made some minor additions: when dealing with command-line input, it's important to handle exceptions. Plus some more print statements to help make clear what's going on.

import os
import sys
import zipfile

archive = sys.argv[1] # assuming launched with `python my_script.py archive.zip`

try:
    with zipfile.ZipFile(archive) as z:    
        for filename in z.namelist():
            if not os.path.isdir(filename):
                print(f'\nFile "{filename}":')
                # read the file
                for line in z.open(filename):
                    print(line.decode('utf-8'))
            else:
                print(f'\nDirectory "{filename}"')
except zipfile.BadZipFile:
    print(f'Bad zip file: "{archive}"')
except IsADirectoryError:
    print(f'Directory, not file: "{archive}"')
except FileNotFoundError:
    print(f'File not found: "{archive}"')

Upvotes: 1

grofte
grofte

Reputation: 2119

I got RichS' code to work. I made some minor edits:

import os
import sys
import zipfile

archive = sys.argv[1] # assuming launched with `python my_script.py archive.zip`

with zipfile.ZipFile(archive) as z:    
    for filename in z.namelist():
        if not os.path.isdir(filename):
            # read the file
            for line in z.open(filename):
                print(line.decode('utf-8'))

As you can see the edits are minor. I've switched to Python 3, the ZipFile class has a capital F, and the output is converted from b-strings to unicode strings. Only decode if you are trying to unzip a text file.

PS I'm not dissing RichS at all. I just thought it would be hilarious. Both useful and a mild shitpost. PPS You can get file from an archive with a password: ZipFile.open(name, mode='r', pwd=None, *, force_zip64=False) or ZipFile.read(name, pwd=None). If you use .read then there's no context manager so you would simply do

            # read the file
            print(z.read(filename).decode('utf-8'))

Upvotes: 3

alecxe
alecxe

Reputation: 473873

namelist() returns a list of all items in an archive recursively.

You can check whether an item is a directory by calling os.path.isdir():

import os
import zipfile

with zipfile.ZipFile('archive.zip') as z:
    for filename in z.namelist():
        if not os.path.isdir(filename):
            # read the file
            with z.open(filename) as f:
                for line in f:
                    print line

Hope that helps.

Upvotes: 41

RichS
RichS

Reputation: 943

I got Alec's code to work. I made some minor edits: (note, this won't work with password-protected zipfiles)

import os
import sys
import zipfile

z = zipfile.ZipFile(sys.argv[1])  # Flexibility with regard to zipfile

for filename in z.namelist():
    if not os.path.isdir(filename):
        # read the file
        for line in z.open(filename):
            print line
        z.close()                # Close the file after opening it
del z                            # Cleanup (in case there's further work after this)

Upvotes: 7

Related Questions