Reputation: 81
I have a FolderA which contains FolderB and FileB. How can I create a tar.gz archive which ONLY contains FolderB and FileB, removing the parent directory FolderA? I'm using Python and I'm running this code on a Windows machine.
The best lead I found was: How to create full compressed tar file using Python?
In the most upvoted answer, people discuss ways to remove the parent directory, but none of them work for me. I've tried arcname, os.walk, and running the tar command via subprocess.call ().
I got close with os.walk, but in the code below, it still drops a " _ " directory in with FolderB and FileB. So, the file structure is ARCHIVE.tar.gz > ARCHIVE.tar > "_" directory, FolderB, FileB.
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
length = len(source_dir)
for root, dirs, files in os.walk(source_dir):
folder = root[length:] # path without "parent"
for file in files:
tar.add(os.path.join(root, folder), folder)
I make the archive using:
make_tarfile('ARCHIVE.tar.gz', 'C:\FolderA')
Should I carry on using os.walk, or is there any other way to solve this?
Here is an image showing the contents of my archive. As you can see, there is a " _ " folder in my archive that I want to get rid of--oddly enough, when I extract, only FolderA and FileB.html appear as archived. In essence, the behavior is correct, but if I could go the last step of removing the " _ " folder from the archive, that would be perfect. I'm going to ask an updated question to limit confusion.
Upvotes: 6
Views: 11816
Reputation: 608
You could use subprocess
to achieve something similar and much faster.
def make_tarfile(output_filename, source_dir):
subprocess.call(["tar", "-C", source_dir, "-zcvf", output_filename, "."])
Upvotes: 1
Reputation: 4190
This works for me:
with tarfile.open(output_filename, "w:gz") as tar:
for fn in os.listdir(source_dir):
p = os.path.join(source_dir, fn)
tar.add(p, arcname=fn)
i.e. Just list the root of the source dir and add each entry to the archive. No need for walking the source dir as adding a directory via tar.add() is automatically recursive.
Upvotes: 6
Reputation: 1367
Here is a function to perform the task. I have had some issues extracting the tar on Windows (with WinRar) as it seemed to try to extract the same file twice, but I think it will work fine when extracting the archive properly.
"""
The directory structure I have is as follows:
├───FolderA
│ │ FileB
│ │
│ └───FolderB
│ FileC
"""
import tarfile
import os
# This is where I stored FolderA on my computer
ROOT = os.path.join(os.path.dirname(__file__), "FolderA")
def make_tarfile(output_filename: str, source_dir: str) -> bool:
"""
:return: True on success, False otherwise
"""
# This is where the path to each file and folder will be saved
paths_to_tar = set()
# os.walk over the root folder ("FolderA") - note it will never get added
for dirpath, dirnames, filenames in os.walk(source_dir):
# Resolve path issues, for example for Windows
dirpath = os.path.normpath(dirpath)
# Add each folder and path in the current directory
# Probably could use zip here instead of set unions but can't be bothered to try to figure it out
paths_to_tar = paths_to_tar.union(
{os.path.join(dirpath, d) for d in dirnames}).union(
{os.path.join(dirpath, f) for f in filenames})
try:
# This will create the tar file in the current directory
with tarfile.open(output_filename, "w:gz") as tar:
# Change the directory to treat all paths relatively
os.chdir(source_dir)
# Finally add each path using the relative path
for path in paths_to_tar:
tar.add(os.path.relpath(path, source_dir))
return True
except (tarfile.TarError, OSError) as e:
print(f"An error occurred - {e}")
return False
if __name__ == '__main__':
make_tarfile("tarred_files.tar.gz", ROOT)
Upvotes: 0
Reputation: 500
I've tried to provide some examples of how changes to the source directory makes a difference to what finally gets extracted.
As per your example, I have this folder structure
I have this python to generate the tar file (lifted from here)
import tarfile
import os
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
What data and structure is included in the tar file depends on what location I provide as a parameter.
So this location parameter,
make_tarfile('folder.tar.gz','folder_A/' )
will generate this result when extracted
If I move into folder_A and reference folder_B,
make_tarfile('folder.tar.gz','folder_A/folder_B' )
This is what the extract will be,
Notice that folder_B is the root of this extract.
Now finally,
make_tarfile('folder.tar.gz','folder_A/folder_B/' )
Will extract to this
Just the file is included in the extract.
Upvotes: 0