Reputation: 901
I am working on a project in python in which I need to extract only a subfolder of tar archive not all the files. I tried to use
tar = tarfile.open(tarfile)
tar.extract("dirname", targetdir)
But this does not work, it does not extract the given subdirectory also no exception is thrown. I am a beginner in python. Also if the above function doesn't work for directories whats the difference between this command and tar.extractfile() ?
Upvotes: 13
Views: 19515
Reputation: 3473
The problem with all of the other solutions is that they require to access the end of the file before extracting - which means that they cannot be applied to a stream which does not support seeking.
Staring with Python 3.11.4 (I haven't found a way with earlier versions):
strip1 = lambda member, path: member.replace(name=pathlib.Path(*pathlib.Path(member.path).parts[1:]))
with tarfile.open('file.tar.gz', mode='r:gz') as input:
input.extractall(path=dest, filter=strip1)
extractall
accepts a filter that gets called for each file with TarInfo
- you unpack the filename, take all parts except the first one and then repack it.
Upvotes: 0
Reputation: 3239
The other answer will retain the subfolder path, meaning that subfolder/a/b
will be extracted to ./subfolder/a/b
. To extract a subfolder to the root, so subfolder/a/b
would be extracted to ./a/b
, you can rewrite the paths with something like this:
def members(tf):
l = len("subfolder/")
for member in tf.getmembers():
if member.path.startswith("subfolder/"):
member.path = member.path[l:]
yield member
with tarfile.open("sample.tar") as tar:
tar.extractall(members=members(tar))
Upvotes: 19
Reputation: 8731
Building on the second example from the tarfile module documentation, you could extract the contained sub-folder and all of its contents with something like this:
with tarfile.open("sample.tar") as tar:
subdir_and_files = [
tarinfo for tarinfo in tar.getmembers()
if tarinfo.name.startswith("subfolder/")
]
tar.extractall(members=subdir_and_files)
This creates a list of the subfolder and its contents, and then uses the recommended extractall()
method to extract just them. Of course, replace "subfolder/"
with the actual path (relative to the root of the tar file) of the sub-folder you want to extract.
Upvotes: 23