Reputation: 4051
I am trying to use tarfile to add a file in memory and then writing it back to disk, but the issue i am having is that in my final output, when i extract the newly created tar.gz file, i am getting an empty file. What am I doing wrong in my code?
import tarfile
import io
with open('logo.png', 'rb') as f:
data = f.read()
fh = io.BytesIO()
with tarfile.open(fileobj=fh, mode='w:gz') as tar:
info = tarfile.TarInfo('some.png')
tar.addfile(info, data)
with open('/tmp/test/test.tar.gz', 'wb') as f:
f.write(fh.getvalue())
I also tried doing tar.addfile(info, fh.write(data))
, but that just creates a corrupted tar file.
Upvotes: 6
Views: 8706
Reputation: 27744
TarFile.addfile()
takes a file-like object.
When the documentation says:
tarinfo.size bytes are read from it and added to the archive.
It means that tarinfo.size
is used to determine how many bytes to read. Therefore, you need set tarinfo.size
appropriately.
The only thing you need to do is read the data from the source, count the length, then load that data into a BytesIO object:
E.g.
import tarfile
import io
with open('logo.png', 'rb') as f:
data = f.read()
source_f = io.BytesIO(initial_bytes=data)
fh = io.BytesIO()
with tarfile.open(fileobj=fh, mode='w:gz') as tar:
info = tarfile.TarInfo('logo.png')
info.size = len(data)
tar.addfile(info, source_f)
with open('test.tar.gz', 'wb') as f:
f.write(fh.getvalue())
or a more memory efficient way, seek the source file:
f = open('logo.png', 'rb')
f.seek(0,2) # go to the end
source_len = f.tell()
f.seek(0)
fh = io.BytesIO()
with tarfile.open(fileobj=fh, mode='w:gz') as tar:
info = tarfile.TarInfo('logo.png')
info.size = source_len
tar.addfile(info, f)
with open('test.tar.gz', 'wb') as f:
f.write(fh.getvalue())
f.close()
Upvotes: 13