Reputation: 163
I am trying to add a file to a gzipped tarfile in python
import tarfile
# create test file
with open("testfile.txt", "w") as f:
f.write("TESTTESTTEST")
# create archive
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
with open("testfile.txt", 'rb') as f:
archive.addfile(tarfile.TarInfo("testfile.txt"), f)
# read test file out of archive
with tarfile.open("archfile.tar.gz", "r:gz") as archive:
print(archive.extractfile("testfile.txt").read())
The result is b''
- an empty bytestring.
The file is not empty - if I try to read the file using the following code:
with open("testfile.txt", 'rb') as f:
print(f.read())
... I get b'TESTTESTTEST'
Is there something obvious I am missing? My end goal is to add the string in memory using f = io.StringIO('TESTTESTTEST')
I also tried removing the :gz
and I see the same problem with a raw tar archive.
For additional info - I'm using Python 3 in a jupyter session on Windows 10. I see the same problem in Windows/Python 3.5.2/PyCharm.
Upvotes: 2
Views: 3591
Reputation: 22245
If you create a tar archive it is critical that the TarInfo
object contains the file size, otherwise you will create files in the archive with no data. The easiest solution is to use gettarinfo
which has the function signature
TarFile.gettarinfo(name=None, arcname=None, fileobj=None)`
Given the open file object it will determine the size. With the correctly create TarInfo
object addfile
will copy in the bytes.
import tarfile
with tarfile.open('Archive.tar.xz', mode='w:xz') as t:
for file_path in file_paths:
with open(file_path, 'rb') as f:
info = t.gettarinfo(fileobj=f)
t.addfile(tarinfo=info, fileobj=f)
Upvotes: 0
Reputation: 2784
I hit a similar problem. The documentation says that when you call tar.addfile
it will write TarInfo.size
bytes from the given file. That means that you have to either create the TarInfo
with the file size or use tar.add()
instead of tar.addfile
:
# create archive V1
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
with open("testfile.txt", 'rb') as f:
info = archive.gettarinfo("testfile.txt")
archive.addfile(info, f)
# create archive V2
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
archive.add("testfile.txt")
# create archive V3
with tarfile.open("archfile.tar.gz", "w:gz") as archive:
with io.BytesIO(b"TESTTESTTEST") as f:
info = tarfile.TarInfo("testfile.txt")
f.seek(0, io.SEEK_END)
info.size = f.tell()
f.seek(0, io.SEEK_SET)
archive.addfile(info, f)
Upvotes: 8
Reputation: 163
Not a perfect answer but I managed to work around this with zipfile.
import zipfile
import io
# create archive
with zipfile.ZipFile("archfile.zip", "w") as archive:
with io.StringIO("TESTTESTTEST") as f:
archive.writestr("1234.txt", f.read())
# read test file out of archive
with zipfile.ZipFile("archfile.zip", "r") as archive:
print(archive.read("1234.txt"))
produces b'TESTTESTTEST'
Upvotes: -1
Reputation: 82765
You can us the StringIO module to write the content as a file object to the tar file.
Sample:
import tarfile
import StringIO
tar = tarfile.TarFile("archfile.tar.gz","w")
with open("testfile.txt", 'rb') as f:
s = StringIO.StringIO(f.read())
info = tarfile.TarInfo(name="testfile.txt")
info.size = len(s.buf)
tar.addfile(tarinfo=info, fileobj=s)
tar.close()
Upvotes: 1