dilaudid
dilaudid

Reputation: 163

Files added to tarfile come back as empty files

I am trying to add a file to a gzipped tarfile in python

import tarfile

# create test file
with open("testfile.txt", "w") as f:
    f.write("TESTTESTTEST")

# create archive
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
    with open("testfile.txt", 'rb') as f:
        archive.addfile(tarfile.TarInfo("testfile.txt"), f)

# read test file out of archive
with tarfile.open("archfile.tar.gz", "r:gz") as archive:
    print(archive.extractfile("testfile.txt").read())

The result is b'' - an empty bytestring.

The file is not empty - if I try to read the file using the following code:

with open("testfile.txt", 'rb') as f:
    print(f.read())

... I get b'TESTTESTTEST'

Is there something obvious I am missing? My end goal is to add the string in memory using f = io.StringIO('TESTTESTTEST')

I also tried removing the :gz and I see the same problem with a raw tar archive.

For additional info - I'm using Python 3 in a jupyter session on Windows 10. I see the same problem in Windows/Python 3.5.2/PyCharm.

Upvotes: 2

Views: 3591

Answers (4)

Cameron Lowell Palmer
Cameron Lowell Palmer

Reputation: 22245

Creating a tar archive using tarfile

If you create a tar archive it is critical that the TarInfo object contains the file size, otherwise you will create files in the archive with no data. The easiest solution is to use gettarinfo which has the function signature

TarFile.gettarinfo(name=None, arcname=None, fileobj=None)`

Given the open file object it will determine the size. With the correctly create TarInfo object addfile will copy in the bytes.

import tarfile

with tarfile.open('Archive.tar.xz', mode='w:xz') as t:
    for file_path in file_paths:
        with open(file_path, 'rb') as f:
            info = t.gettarinfo(fileobj=f)
            t.addfile(tarinfo=info, fileobj=f)

Upvotes: 0

Roman Kutlak
Roman Kutlak

Reputation: 2784

I hit a similar problem. The documentation says that when you call tar.addfile it will write TarInfo.size bytes from the given file. That means that you have to either create the TarInfo with the file size or use tar.add() instead of tar.addfile:

# create archive V1
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
    with open("testfile.txt", 'rb') as f:
        info = archive.gettarinfo("testfile.txt")
        archive.addfile(info, f)

# create archive V2
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
    archive.add("testfile.txt")

# create archive V3
with tarfile.open("archfile.tar.gz", "w:gz") as archive:
    with io.BytesIO(b"TESTTESTTEST") as f:
        info = tarfile.TarInfo("testfile.txt")
        f.seek(0, io.SEEK_END)
        info.size = f.tell()
        f.seek(0, io.SEEK_SET)
        archive.addfile(info, f)

Upvotes: 8

dilaudid
dilaudid

Reputation: 163

Not a perfect answer but I managed to work around this with zipfile.

import zipfile
import io

# create archive
with zipfile.ZipFile("archfile.zip", "w") as archive:
    with io.StringIO("TESTTESTTEST") as f:
        archive.writestr("1234.txt", f.read())

# read test file out of archive
with zipfile.ZipFile("archfile.zip", "r") as archive:
    print(archive.read("1234.txt"))

produces b'TESTTESTTEST'

Upvotes: -1

Rakesh
Rakesh

Reputation: 82765

You can us the StringIO module to write the content as a file object to the tar file.

Sample:

import tarfile
import StringIO

tar = tarfile.TarFile("archfile.tar.gz","w")
with open("testfile.txt", 'rb') as f:
    s = StringIO.StringIO(f.read())

info = tarfile.TarInfo(name="testfile.txt")
info.size = len(s.buf)
tar.addfile(tarinfo=info, fileobj=s)
tar.close()

Upvotes: 1

Related Questions