Reputation: 81

Python: Delete file from the TAR archive using tarfile

Is it possible to remove from a TAR archive some file using tarfile?

For example:

If an x.tar file includes the files a.txt, b.txt and c.txt, is it possible to remove a.txt?

In other words: does any python solution exist to achieve something like this:

tar -vf x.tar --delete a.txt?

Upvotes: 5

Answers (3)

Ken

Reputation: 602

In fact, it is possible... but with huge restrictions. You can only delete the end/tail of the archive, not files at the beginning or in the middle of it.
I just had a similar need for extracting files from a huge tar (450G) without enough space for both the tar and the extracted files. I had to extract files one at a time and remove them from the .tar as soon as they were extracted.
The command tar -vf x.tar --delete a.txt does not solve that because it does not delete the a.txt from the x.tar (the x.tar remains the same size), it just removes it from the list of contained files (a.txt will not be extracted when untaring x.tar later).
The only thing you can do with .tar files, because they are sequential, is to truncate them. So the only solution is to extract files from the end.
First you get the list of all the members of the tar file:

with tarfile.open(name=tar_file_path, mode="r") as tar_file:
     tar_members = tar_file.getmembers()

Then you can extract the files you want from the end:

with tarfile.open(name=tar_file_path, mode="r") as tar_file:
     tar_file.extractall(path = extracting_dir, members = tar_members[first_of_files_to_extract:])

You compute where to truncate the file (in bytes):

truncate_size = tar_members[first_of_files_to_extract].offset

Then you add "end of file" marker, i.e. two consecutive blocks of Nulls. Each block is 512 bytes long in .tar, so you need to have 1024 Null bytes at the end. Here, just for the record, you can add 512 bytes (one block) because the previous tar_member already finish by a 512 bytes Null block (marker of end of tar_member).

new_file_size = truncate_size + 1024 # 2 blocs of 512 Null bytes

And you finally do the truncations, first for removing last members, second for adding null bytes (here we do not open the .tar with tarfile.open() anymore, truncation is just regular file operation):

with open(tar_file_path) as tar_file:
    tar_file.truncate(truncate_size)
    tar_file.truncate(new_file_size)

Here you have extracted files from the end of the .tar, and you've got a new valid .tar file, smaller than the previous one by the size of the extracted files plus some blocks bytes, and you have limitated extra memory usage to the size of the files extracted: I personally did that file by file (extract last file, truncate, extract last file truncate etc).

Upvotes: 3

user12025948

Reputation:

I had a similar problem and ended up using the 7z Command Line (7za.exe), since it supports more functions than Python's tarfile, including deleting files from archive.

The downside of this solution is that you need to carry the 7za.exe file around with the program.

In your case, you could use something like

os.system("7za d x.tar a.txt")

Do however keep in mind that os.system is deprecated and you should use subprocess. Never used it, so I can't really help more.

Upvotes: 1

Yserbius

Reputation: 1414

Not with tarfile directly, although there may be some other library out there. A quick hack you can do is to extract the files, then recreate the tar minus the files you want to delete.

Upvotes: 3

Python: Delete file from the TAR archive using tarfile

Answers (3)

Related Questions