Reputation: 539
Here's some code meant to copy the contents of a zipfile to a tarfile. I intend later to limit the copying to files that appear within a list that's passed in as a further argument, but for now, I'm just trying to get copying work.
import zipfile, tempfile, shutil, tarfile, os
def gather_and_repackage_files(zip_file_path, target_file_path) :
with tarfile.open(target_file_path, "w") as tar:
with zipfile.ZipFile(zip_file_path) as zip_file:
for member in zip_file.namelist():
filename = os.path.basename(member)
# skip directories
if not filename:
continue
print "File: ", filename
# copy file (taken from zipfile's extract)
source = zip_file.open(member)
with tempfile.NamedTemporaryFile(delete=False) as temp:
print temp.name
shutil.copyfileobj(source, temp)
tar.add(temp.name, arcname=filename)
gather_and_repackage_files("./stuff.zip", "./tarfile.tar")
Before I run this, the contents of my directory are "testin.py" (the program above) and "stuff.zip". "stuff.zip" is a zipfile containing two tiny text files, a.txt and b.txt, each of which contains about 15 characters. Apparently it also contains mac-backups of these, "_a.txt" and "_b.txt" as well (although when I expand it with the Archive utility, those do not appear, even with "ls -al").
After execution (Python 2.7.10), there's an additional file "tarfile.tar"; when I open this with the Archive utility on my Mac, I see this:
drwx------ 6 jfh staff 204 Oct 29 16:51 .
drwxr-xr-x 7 jfh staff 238 Oct 29 16:51 ..
-rw------- 1 jfh staff 0 Oct 29 16:50 ._a.txt
-rw------- 1 jfh staff 0 Oct 29 16:50 ._b.txt
-rw------- 1 jfh staff 0 Oct 29 16:50 a.txt
-rw------- 1 jfh staff 0 Oct 29 16:50 b.txt
The temporary files created during execution actually DO contain the 15 or so characters of silly text, but the ones in the tarfile are zero-length.
So my question is "Why does the tar-file contain 0-length versions of a.txt and b.txt?"
Upvotes: 1
Views: 343
Reputation: 3124
The temp file may not have been completely flushed.
You could try to: temp.flush() os.fsync()
But of course it would be better not to create the temp file in the first place. Which you can avoid by using tar.addfile
instead of tar.add
.
You also need to set the size of the tarinfo that you provide.
Note: you could also set mtime to preserve the time.
This modification should do it:
import zipfile
import tarfile
import os
def gather_and_repackage_files(zip_file_path, target_file_path) :
with tarfile.open(target_file_path, "w") as tar:
with zipfile.ZipFile(zip_file_path) as zip_file:
for info in zip_file.infolist():
filename = os.path.basename(info.filename)
# skip directories
if not filename:
continue
# copy file (taken from zipfile's extract)
with zip_file.open(info) as source:
tarinfo = tarfile.TarInfo(filename)
tarinfo.size = info.file_size
tar.addfile(tarinfo, source)
gather_and_repackage_files("./stuff.zip", "./tarfile.tar")
Upvotes: 0
Reputation: 539
Here is working code:
import zipfile, tempfile, shutil, tarfile, os
def gather_and_repackage_files(zip_file_path, target_file_path) :
with tarfile.open(target_file_path, "w") as tar:
with zipfile.ZipFile(zip_file_path) as zip_file:
for member in zip_file.namelist():
filename = os.path.basename(member)
# skip directories
if not filename:
continue
print "File: ", filename
print "Member: ", member
source = zip_file.open(member)
with tempfile.NamedTemporaryFile(delete=False) as temp:
print temp.name
shutil.copyfileobj(source, temp)
temp.close()
tar.add(temp.name, arcname=filename)
The secret sauce is in 'temp.close()', one line before the end. It turns out that that you can't add an open file to a tar archive (although the documentation doesn't seem to mention that).
Upvotes: 1