Reputation:
I am looking for a way in which I can pickle some Python objects into a combined tar archive. Further I also need to use np.save(....)
to save some numpy arrays in yet the same archive. Of corse, I also need to read them later.
So what I tried is
a = np.linspace(1,10,10000)
tar = tarfile.open(fileName, "w")
tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))
tar.close()
and I get the error:
'numpy.ndarray' object has no attribute 'write'
Simlar problems I get if I pickle an object in the tar-file. Any suggestions? If it is easier, json-pickle would also work.
EDIT: as mentioned in the comments I confused the arguments of np.save(). However, this does not solve the issue, as now I get the error:
object of type 'NoneType' has no len()
EDIT 2: If there is no solution to the above problem, do you know of any other way of time efficiently boundle files?
Upvotes: 2
Views: 5290
Reputation: 231738
First, I'm not a expert tar
user, but I can point out a couple of things:
a = np.linspace(1,10,10000)
tar = tarfile.open(fileName, "w")
If you want to add a file to an existing file, use the "a" mode (or study the available modes). "w" creates a new blank file:
tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))
The correct use of np.save
has already been mentioned.
A TarInfo
object is not the file/data, but rather information about the file. That information is placed in the tar file before the data, in a 512 byte buffer. tobuf
creates such a buffer from the attributes of the object. frombuf
decodes such a buffer. It is used, for example in the fromtarfile
method:
def fromtarfile(cls, tarfile):
"""Return the next TarInfo object from TarFile object
tarfile.
"""
buf = tarfile.fileobj.read(BLOCKSIZE)
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
return obj._proc_member(tarfile)
So clearly frombuf
is not what you want to use here.
A 2009 SO question - python write string directly to tarfile - shows that it is possible to write directly to a tarfile by using a string buffer. From the accepted answer:
# create a `StringIO` object, and fill it
string = StringIO.StringIO()
...
# create `TarInfo` object:
info = tarfile.TarInfo(name="foo")
info.size=len(string.buf)
# use both with `addfile`:
tar.addfile(tarinfo=info, fileobj=string)
I think you can do a np.save
to StringIO
buffer, but I'd have to check/test to be sure. For ordinary arrays, save
writes a header with size, shape, dtype info, and then adds the array's data buffer. For other objects and array it resorts to pickle
.
I'd suggest getting a regular np.save
to file, followed by addfile
working. Then see if writing to a string buffer works and whether it saves any time.
Here's a test script. It writes one array to a tar file, closes and reopens the file and writes another, and finally it extracts the files and loads them. Returned shapes look fine. I haven't looked at whether it is possible to extract these files to memory buffers or not.
np.savez
could do the same thing zip archiving (rather than tar).
import numpy as np
import tarfile
import io # python3 version
abuf = io.BytesIO()
np.save(abuf, np.arange(100))
abuf.seek(0)
tar=tarfile.TarFile('test.tar','w')
info= tarfile.TarInfo(name='anArray')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()
abuf = io.BytesIO()
np.save(abuf, np.ones((2,3,4)))
abuf.seek(0)
tar=tarfile.TarFile('test.tar','a')
info= tarfile.TarInfo(name='anOther')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()
tar=tarfile.TarFile('test.tar','r')
print(tar.getnames())
tar.extractall()
# can I extract to buffers?
tar.close()
a=np.load('anArray')
b=np.load('anOther')
print(a.shape, b.shape)
also
1415:~/mypy$ tar -tvf test.tar
-rw-r--r-- 0/0 480 1969-12-31 16:00 anArray
-rw-r--r-- 0/0 272 1969-12-31 16:00 anOther
Upvotes: 4