Chris R
Chris R

Reputation: 17906

How can I modify a file in a gzipped tar file?

I want to write a (preferably python) script to modify the content of one file in a gzipped tar file. The script must run on FreeBSD 6+.

Basically, I need to:

I'll be repeating this for a lot of files.

Python's tarfile module doesn't seem to be able to open tar files for read/write access when they're compressed, which makes a certain amount of sense. However, I can't find a way to copy the tar file with modifications, either.

Is there an easy way to do this?

Upvotes: 4

Views: 7879

Answers (3)

Ulrich Eckhardt
Ulrich Eckhardt

Reputation: 17415

I think David Phillips already answered quite well, but here's some example code on top:

with tarfile.open(input_tar_file, 'r:gz') as input_archive:
    with tarfile.open(output_tar_file, 'w:gz') as output_archive:
        for name in input_archive.getnames():
            info = input_archive.getmember(name)
            file = input_archive.extractfile(name)
            print(f'loaded {name} size {info.size}')
            output_archive.addfile(info, file)

This code does a copy of the input_tar_file to the output_tar_file. If you want to modify things, start at the print() call. There, you can inspect the input, discard it, modify it as you desire.

Things to keep in mind:

  • Make sure you write a directory before writing a file into that directory.
  • The size is kind-of given twice when adding a file. One place is in info.size, the other is implicitly given by the length of the file stream.

Upvotes: 0

Ryan Christensen
Ryan Christensen

Reputation: 7933

I don't see an easy way to remove a single file. You can easily extract one or all, then add any files needed.

I think that the only way is:

  • Open the tarfile using python tarfile, rename it.
  • Create a duplicate empty tar for the original file name
  • Re-add all the files, changing the one you need before re-add
  • Be sure to reset the correct format when you read it on re-creation

    tarfile.USTAR_FORMAT POSIX.1-1988 (ustar) format. tarfile.GNU_FORMAT GNU tar format. tarfile.PAX_FORMAT POSIX.1-2001 (pax) format. tarfile.DEFAULT_FORMAT

http://docs.python.org/library/tarfile.html

Upvotes: 1

David Phillips
David Phillips

Reputation: 10208

Don't think of a tar file as a database that you can read/write -- it's not. A tar file is a concatenation of files. To modify a file in the middle, you need to rewrite the rest of the file. (for files of a certain size, you might be able to exploit the block padding)

What you want to do is process the tarball file by file, copying files (with modifications) into a new tarball. The Python tarfile module should make this easy to do. You should be able to retain the attributes by copying them from the old TarInfo object to the new one.

Upvotes: 6

Related Questions