John
John

Reputation: 539

Why aren't file contents getting copied into my tarfile

Here's some code meant to copy the contents of a zipfile to a tarfile. I intend later to limit the copying to files that appear within a list that's passed in as a further argument, but for now, I'm just trying to get copying work.

import zipfile, tempfile, shutil, tarfile, os

def gather_and_repackage_files(zip_file_path, target_file_path) :
    with tarfile.open(target_file_path, "w") as tar:
        with zipfile.ZipFile(zip_file_path) as zip_file:
            for member in zip_file.namelist():
                filename = os.path.basename(member)
                # skip directories
                if not filename:
                    continue

                print "File: ", filename
                # copy file (taken from zipfile's extract)
                source = zip_file.open(member)
                with tempfile.NamedTemporaryFile(delete=False) as temp:
                    print temp.name
                    shutil.copyfileobj(source, temp)
                    tar.add(temp.name, arcname=filename)


gather_and_repackage_files("./stuff.zip", "./tarfile.tar")

Before I run this, the contents of my directory are "testin.py" (the program above) and "stuff.zip". "stuff.zip" is a zipfile containing two tiny text files, a.txt and b.txt, each of which contains about 15 characters. Apparently it also contains mac-backups of these, "_a.txt" and "_b.txt" as well (although when I expand it with the Archive utility, those do not appear, even with "ls -al").

After execution (Python 2.7.10), there's an additional file "tarfile.tar"; when I open this with the Archive utility on my Mac, I see this:

drwx------  6 jfh  staff  204 Oct 29 16:51 .
drwxr-xr-x  7 jfh  staff  238 Oct 29 16:51 ..
-rw-------  1 jfh  staff    0 Oct 29 16:50 ._a.txt
-rw-------  1 jfh  staff    0 Oct 29 16:50 ._b.txt
-rw-------  1 jfh  staff    0 Oct 29 16:50 a.txt
-rw-------  1 jfh  staff    0 Oct 29 16:50 b.txt

The temporary files created during execution actually DO contain the 15 or so characters of silly text, but the ones in the tarfile are zero-length.

So my question is "Why does the tar-file contain 0-length versions of a.txt and b.txt?"

Upvotes: 1

Views: 343

Answers (2)

de1
de1

Reputation: 3124

The temp file may not have been completely flushed.

You could try to: temp.flush() os.fsync()

But of course it would be better not to create the temp file in the first place. Which you can avoid by using tar.addfile instead of tar.add.

You also need to set the size of the tarinfo that you provide.

Note: you could also set mtime to preserve the time.

This modification should do it:

import zipfile
import tarfile
import os

def gather_and_repackage_files(zip_file_path, target_file_path) :
    with tarfile.open(target_file_path, "w") as tar:
        with zipfile.ZipFile(zip_file_path) as zip_file:
            for info in zip_file.infolist():
                filename = os.path.basename(info.filename)
                # skip directories
                if not filename:
                    continue

                # copy file (taken from zipfile's extract)
                with zip_file.open(info) as source:
                  tarinfo = tarfile.TarInfo(filename)
                  tarinfo.size = info.file_size
                  tar.addfile(tarinfo, source)


gather_and_repackage_files("./stuff.zip", "./tarfile.tar")

Upvotes: 0

John
John

Reputation: 539

Here is working code:

import zipfile, tempfile, shutil, tarfile, os

def gather_and_repackage_files(zip_file_path, target_file_path) :
    with tarfile.open(target_file_path, "w") as tar:
        with zipfile.ZipFile(zip_file_path) as zip_file:
            for member in zip_file.namelist():
                filename = os.path.basename(member)
                # skip directories
                if not filename:
                    continue

                print "File: ", filename
                print "Member: ", member
                source = zip_file.open(member)
                with tempfile.NamedTemporaryFile(delete=False) as temp:
                    print temp.name

                    shutil.copyfileobj(source, temp)

                    temp.close()
                    tar.add(temp.name, arcname=filename)

The secret sauce is in 'temp.close()', one line before the end. It turns out that that you can't add an open file to a tar archive (although the documentation doesn't seem to mention that).

Upvotes: 1

Related Questions