Abhishek Kannan
Abhishek Kannan

Reputation: 988

Reading MatLab files in python w/ scipy

I'm using python w/ scipy package to read the MatLab file.

However it takes too long and crashes.

The Dataset is about 50~ MB in size

Is there any better way to read the data and form an edge list ?

My python code

import scipy.io as io
data=io.loadmat('realitymining.mat')
print data

Upvotes: 8

Views: 1615

Answers (3)

blue note
blue note

Reputation: 29081

You could just save each field of the struct in a different text file, eg:

save('friends.txt', '-struct', 'network', 'friends', '-ascii')

and load each file separately from python

friends = numpy.loadtxt('friends.txt')

which loads instantly.

Upvotes: 1

Ray
Ray

Reputation: 2508

Maybe you can first work on part of he data as the network in the struct, I have unpacked it here using MATLAB.

Still working on how to tidy up the rest bigger struct.

Upvotes: 0

hpaulj
hpaulj

Reputation: 231395

I can load it after unzipping. But it is stretching the memory.

When I try to load it with octave I get:

octave:1> load realitymining.mat
error: memory exhausted or requested size too large for range of Octave's index type -- trying to return to prompt

In Ipython

In [10]: data.keys()
Out[10]: ['network', 's', '__version__', '__header__', '__globals__']
In [14]: data['__header__']
Out[14]: 'MATLAB 5.0 MAT-file, Platform: MACI, Created on: Tue Sep 29 20:13:23 2009'
In [15]: data['s'].shape
Out[15]: (1, 106)
In [17]: data['s'].dtype
Out[17]: dtype([('comm', 'O'), ('charge', 'O'), ('active', 'O'), ('logtimes', 'O'),...  
   ('my_intros', 'O'), ('home_nights', 'O'), ('comm_local', 'O'), ('data_mat', 'O')])
# 58 fields
In [24]: data['s']['comm'][0,1].shape
Out[24]: (1, 30)
In [31]: data['s']['comm'][0,1][0,1]
Out[31]: ([[732338.8737731482]], [[355]], [[-1]], [u'Packet Data'], [u'Outgoing'], 
    [[40]], [[nan]])
In [33]: data['s']['comm'][0,1]['date']
Out[33]: 
array([[array([[ 732338.86915509]]), array([[ 732338.87377315]]),
    ...
    array([[ 732340.48579861]]), array([[ 732340.52778935]])]], dtype=object)

Look at the pieces. Simply trying to print data or print data['s'] takes too long. Apparently it is just too big of structure to format quickly.

To practically get at this data, I'd suggest loading it once in Python or Matlab, and then save the useful pieces to one or more files.

Upvotes: 0

Related Questions