aye
aye

Reputation: 81

How to create tar.gz archive in Python/tar without include parent directory?

I have a FolderA which contains FolderB and FileB. How can I create a tar.gz archive which ONLY contains FolderB and FileB, removing the parent directory FolderA? I'm using Python and I'm running this code on a Windows machine.

The best lead I found was: How to create full compressed tar file using Python?

In the most upvoted answer, people discuss ways to remove the parent directory, but none of them work for me. I've tried arcname, os.walk, and running the tar command via subprocess.call ().

I got close with os.walk, but in the code below, it still drops a " _ " directory in with FolderB and FileB. So, the file structure is ARCHIVE.tar.gz > ARCHIVE.tar > "_" directory, FolderB, FileB.

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        length = len(source_dir)
        for root, dirs, files in os.walk(source_dir):
            folder = root[length:]  # path without "parent"
            for file in files:
                tar.add(os.path.join(root, folder), folder)

I make the archive using:

make_tarfile('ARCHIVE.tar.gz', 'C:\FolderA')

Should I carry on using os.walk, or is there any other way to solve this?

Update

Here is an image showing the contents of my archive. As you can see, there is a " _ " folder in my archive that I want to get rid of--oddly enough, when I extract, only FolderA and FileB.html appear as archived. In essence, the behavior is correct, but if I could go the last step of removing the " _ " folder from the archive, that would be perfect. I'm going to ask an updated question to limit confusion.

Upvotes: 6

Views: 11816

Answers (4)

Hardian Lawi
Hardian Lawi

Reputation: 608

You could use subprocess to achieve something similar and much faster.

def make_tarfile(output_filename, source_dir):
    subprocess.call(["tar", "-C", source_dir, "-zcvf", output_filename, "."])

Upvotes: 1

driedler
driedler

Reputation: 4190

This works for me:

with tarfile.open(output_filename, "w:gz") as tar:
    for fn in os.listdir(source_dir):
        p = os.path.join(source_dir, fn)
        tar.add(p, arcname=fn)

i.e. Just list the root of the source dir and add each entry to the archive. No need for walking the source dir as adding a directory via tar.add() is automatically recursive.

Upvotes: 6

Kacperito
Kacperito

Reputation: 1367

Here is a function to perform the task. I have had some issues extracting the tar on Windows (with WinRar) as it seemed to try to extract the same file twice, but I think it will work fine when extracting the archive properly.

"""
The directory structure I have is as follows:

├───FolderA
│   │   FileB
│   │
│   └───FolderB
│           FileC
"""

import tarfile
import os

# This is where I stored FolderA on my computer
ROOT = os.path.join(os.path.dirname(__file__), "FolderA")


def make_tarfile(output_filename: str, source_dir: str) -> bool:
    """ 
    :return: True on success, False otherwise
    """

    # This is where the path to each file and folder will be saved
    paths_to_tar = set()

    # os.walk over the root folder ("FolderA") - note it will never get added
    for dirpath, dirnames, filenames in os.walk(source_dir):

        # Resolve path issues, for example for Windows
        dirpath = os.path.normpath(dirpath)

        # Add each folder and path in the current directory
        # Probably could use zip here instead of set unions but can't be bothered to try to figure it out
        paths_to_tar = paths_to_tar.union(
            {os.path.join(dirpath, d) for d in dirnames}).union(
            {os.path.join(dirpath, f) for f in filenames})

    try:
        # This will create the tar file in the current directory
        with tarfile.open(output_filename, "w:gz") as tar:

            # Change the directory to treat all paths relatively
            os.chdir(source_dir)

            # Finally add each path using the relative path
            for path in paths_to_tar:
                tar.add(os.path.relpath(path, source_dir))
            return True

    except (tarfile.TarError, OSError) as e:
        print(f"An error occurred - {e}")
        return False


if __name__ == '__main__':
    make_tarfile("tarred_files.tar.gz", ROOT)

Upvotes: 0

the_good_pony
the_good_pony

Reputation: 500

I've tried to provide some examples of how changes to the source directory makes a difference to what finally gets extracted.

As per your example, I have this folder structure

enter image description here

I have this python to generate the tar file (lifted from here)

import tarfile
import os

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

What data and structure is included in the tar file depends on what location I provide as a parameter.

So this location parameter,

make_tarfile('folder.tar.gz','folder_A/' )

will generate this result when extracted

enter image description here

If I move into folder_A and reference folder_B,

make_tarfile('folder.tar.gz','folder_A/folder_B' )

This is what the extract will be,

enter image description here

Notice that folder_B is the root of this extract.

Now finally,

make_tarfile('folder.tar.gz','folder_A/folder_B/' )

Will extract to this

enter image description here

Just the file is included in the extract.

Upvotes: 0

Related Questions