mistyped
mistyped

Reputation: 63

Extract the content of a specific folder from a zip archive in Python3

I have a zip archive whose internal structure looks like this:

file.zip
  |
   --- foo/
  |
   --- bar/
        |
         --- file1.txt
        |
         --- dir/
              |
               --- file2.txt

and I would like to extract the content of bar to an output directory using python3, getting something that looks like so:

output-dir/
    |
     --- file1.txt
    |
     --- dir/
          |
           --- file2.txt

However, when I run the code below both bar and it's content is being extracted to output-dir

import zipfile

archive = zipfile.ZipFile('path/to/file.zip')

for archive_item in archive.namelist():
    if archive_item.startswith('bar/'):
        archive.extract(archive_item, 'path/to/output-dir')

How can I tackle this problem? Thanks!

Upvotes: 2

Views: 1325

Answers (1)

Masklinn
Masklinn

Reputation: 42272

Instead of using ZipFile.extract, use ZipFile.open, open and shutil.copyfileobj in order to put the file exactly where you want it to be, using path manipulation to create the output path you need.

archive = zipfile.ZipFile('path/to/file.zip')
PREFIX = 'bar/'
out = pathlib.Path('path/to/output-dir')
for archive_item in archive.namelist():
    if archive_item.startswith(PREFIX):
        # strip out the leading prefix then join to `out`, note that you 
        # may want to add some securing against path traversal if the zip
        # file comes from an untrusted source
        destpath = out.joinpath(archive_item[len(PREFIX):])
        # make sure destination directory exists otherwise `open` will fail
        os.makedirs(destpath.parent, exist_ok=True)
        with archive.open(archive_item) as source,
             open(destpath, 'wb') as dest:
            shutil.copyfileobj(source, dest)

something like that.

Upvotes: 4

Related Questions