user4290866
user4290866

Reputation:

Write data directly to a tar archive

I am looking for a way in which I can pickle some Python objects into a combined tar archive. Further I also need to use np.save(....) to save some numpy arrays in yet the same archive. Of corse, I also need to read them later.

So what I tried is

a = np.linspace(1,10,10000)    
tar = tarfile.open(fileName, "w")
tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))
tar.close()

and I get the error:

'numpy.ndarray' object has no attribute 'write'

Simlar problems I get if I pickle an object in the tar-file. Any suggestions? If it is easier, json-pickle would also work.

EDIT: as mentioned in the comments I confused the arguments of np.save(). However, this does not solve the issue, as now I get the error:

object of type 'NoneType' has no len()

EDIT 2: If there is no solution to the above problem, do you know of any other way of time efficiently boundle files?

Upvotes: 2

Views: 5290

Answers (1)

hpaulj
hpaulj

Reputation: 231738

First, I'm not a expert tar user, but I can point out a couple of things:

 a = np.linspace(1,10,10000)    

 tar = tarfile.open(fileName, "w")

If you want to add a file to an existing file, use the "a" mode (or study the available modes). "w" creates a new blank file:

 tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))

The correct use of np.save has already been mentioned.

A TarInfo object is not the file/data, but rather information about the file. That information is placed in the tar file before the data, in a 512 byte buffer. tobuf creates such a buffer from the attributes of the object. frombuf decodes such a buffer. It is used, for example in the fromtarfile method:

def fromtarfile(cls, tarfile):
    """Return the next TarInfo object from TarFile object
       tarfile.
    """
    buf = tarfile.fileobj.read(BLOCKSIZE)
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
    obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
    return obj._proc_member(tarfile)

So clearly frombuf is not what you want to use here.

A 2009 SO question - python write string directly to tarfile - shows that it is possible to write directly to a tarfile by using a string buffer. From the accepted answer:

# create a `StringIO` object, and fill it
string = StringIO.StringIO()
...
# create `TarInfo` object:
info = tarfile.TarInfo(name="foo")
info.size=len(string.buf)
# use both with `addfile`:
tar.addfile(tarinfo=info, fileobj=string)

I think you can do a np.save to StringIO buffer, but I'd have to check/test to be sure. For ordinary arrays, save writes a header with size, shape, dtype info, and then adds the array's data buffer. For other objects and array it resorts to pickle.

I'd suggest getting a regular np.save to file, followed by addfile working. Then see if writing to a string buffer works and whether it saves any time.


Here's a test script. It writes one array to a tar file, closes and reopens the file and writes another, and finally it extracts the files and loads them. Returned shapes look fine. I haven't looked at whether it is possible to extract these files to memory buffers or not.

np.savez could do the same thing zip archiving (rather than tar).

import numpy as np
import tarfile

import io   # python3 version
abuf = io.BytesIO()

np.save(abuf, np.arange(100))
abuf.seek(0)

tar=tarfile.TarFile('test.tar','w')
info= tarfile.TarInfo(name='anArray')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()

abuf = io.BytesIO()
np.save(abuf, np.ones((2,3,4)))
abuf.seek(0)

tar=tarfile.TarFile('test.tar','a')
info= tarfile.TarInfo(name='anOther')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()

tar=tarfile.TarFile('test.tar','r')
print(tar.getnames())
tar.extractall()
# can I extract to buffers?
tar.close()
a=np.load('anArray')
b=np.load('anOther')
print(a.shape, b.shape)

also

1415:~/mypy$ tar -tvf test.tar 
-rw-r--r-- 0/0             480 1969-12-31 16:00 anArray 
-rw-r--r-- 0/0             272 1969-12-31 16:00 anOther

Upvotes: 4

Related Questions