Reputation: 81
Is it possible to remove from a TAR archive some file using tarfile
?
For example:
If an x.tar
file includes the files a.txt
, b.txt
and c.txt
, is it possible to remove a.txt
?
In other words: does any python solution exist to achieve something like this:
tar -vf x.tar --delete a.txt
?
Upvotes: 5
Views: 5095
Reputation: 602
In fact, it is possible... but with huge restrictions. You can only delete the end/tail of the archive, not files at the beginning or in the middle of it.
I just had a similar need for extracting files from a huge tar (450G) without enough space for both the tar and the extracted files. I had to extract files one at a time and remove them from the .tar
as soon as they were extracted.
The command tar -vf x.tar --delete a.txt
does not solve that because it does not delete the a.txt
from the x.tar
(the x.tar
remains the same size), it just removes it from the list of contained files (a.txt
will not be extracted when untaring x.tar
later).
The only thing you can do with .tar
files, because they are sequential, is to truncate them. So the only solution is to extract files from the end.
First you get the list of all the members of the tar file:
with tarfile.open(name=tar_file_path, mode="r") as tar_file:
tar_members = tar_file.getmembers()
Then you can extract the files you want from the end:
with tarfile.open(name=tar_file_path, mode="r") as tar_file:
tar_file.extractall(path = extracting_dir, members = tar_members[first_of_files_to_extract:])
You compute where to truncate the file (in bytes):
truncate_size = tar_members[first_of_files_to_extract].offset
Then you add "end of file" marker, i.e. two consecutive blocks of Nulls. Each block is 512 bytes long in .tar
, so you need to have 1024 Null bytes at the end. Here, just for the record, you can add 512 bytes (one block) because the previous tar_member already finish by a 512 bytes Null block (marker of end of tar_member).
new_file_size = truncate_size + 1024 # 2 blocs of 512 Null bytes
And you finally do the truncations, first for removing last members, second for adding null bytes (here we do not open the .tar
with tarfile.open()
anymore, truncation is just regular file operation):
with open(tar_file_path) as tar_file:
tar_file.truncate(truncate_size)
tar_file.truncate(new_file_size)
Here you have extracted files from the end of the .tar
, and you've got a new valid .tar
file, smaller than the previous one by the size of the extracted files plus some blocks bytes, and you have limitated extra memory usage to the size of the files extracted: I personally did that file by file (extract last file, truncate, extract last file truncate etc).
Upvotes: 3
Reputation:
I had a similar problem and ended up using the 7z Command Line (7za.exe), since it supports more functions than Python's tarfile, including deleting files from archive.
The downside of this solution is that you need to carry the 7za.exe file around with the program.
In your case, you could use something like
os.system("7za d x.tar a.txt")
Do however keep in mind that os.system
is deprecated and you should use subprocess
. Never used it, so I can't really help more.
Upvotes: 1
Reputation: 1414
Not with tarfile
directly, although there may be some other library out there. A quick hack you can do is to extract the files, then recreate the tar
minus the files you want to delete.
Upvotes: 3