Onlyjus
Onlyjus

Reputation: 6159

Read matlab file (*.mat) from zipped file without extracting to directory in Python

This specific questions stems from the attempt to handle large data sets produced by a MATLAB algorithm so that I can process them with python algorithms.

Background: I have large arrays in MATLAB (typically 20x20x40x15000 [i,j,k,frame]) and I want to use them in python. So I save the array to a *.mat file and use scipy.io.loadmat(fname) to read the *.mat file into a numpy array. However, a problem arises in that if I try to load the entire *.mat file in python, a memory error occurs. To get around this, I slice the *.mat file into pieces, so that I can load the pieces one at a time into a python array. If I divide up the *.mat by frame, I now have 15,000 *.mat files which quickly becomes a pain to work with (at least in windows). So my solution is to use zipped files.

Question: Can I use scipy to directly read a *.mat file from a zipped file without first unzipping the file to the current working directory?

Specs: Python 2.7, windows xp

Current code:

import scipy.io
import zipfile
import numpy as np

def readZip(zfilename,dim,frames):
    data=np.zeros((dim[0],dim[1],dim[2],frames),dtype=np.float32)
    zfile = zipfile.ZipFile( zfilename, "r" )
    i=0
    for info in zfile.infolist():
        fname = info.filename
        zfile.extract(fname)
        mat=scipy.io.loadmat(fname)
        data[:,:,:,i]=mat['export']
        mat.clear()
        i=i+1
    return data

Tried code:

mat=scipy.io.loadmat(zfile.read(fname))

produces this error:

TypeError: file() argument 1 must be encoded string without NULL bytes, not str

mat=scipy.io.loadmat(zfile.open(fname))

produces this error:

fileobj.seek(0)
UnsupportedOperation: seek

Any other suggestions on handling the data are appreciated.

Thanks!

Upvotes: 5

Views: 3589

Answers (2)

Onlyjus
Onlyjus

Reputation: 6159

I am pretty sure that the answer to my question is NO and there are better ways to accomplish what I am trying to do.

Regardless, with the suggestion from J.F. Sebastian, I have devised a solution.

Solution: Save the data in MATLAB in the HDF5 format, namely hdf5write(fname, '/data', data_variable). This produces a *.h5 file which then can be read into python via h5py.

python code:

import h5py

r = h5py.File(fname, 'r+')
data = r['data']

I can now index directly into the data, however is stays on the hard drive.

print data[:,:,:,1]

Or I can load it into memory.

data_mem = data[:]

However, this once again gives memory errors. So, to get it into memory I can loop through each frame and add it to a numpy array.

h5py FTW!

Upvotes: 3

g.d.d.c
g.d.d.c

Reputation: 48028

In one of my frozen applications we bundle some files into the .bin file that py2exe creates, then pull them out like this:

z = zipfile.ZipFile(os.path.join(myDir, 'common.bin'))

data = z.read('schema-new.sql')

I am not certain if that would feed your .mat files into scipy, but I'd consider it worth a try.

Upvotes: 0

Related Questions