Reputation: 988
I'm using python w/ scipy package to read the MatLab file.
However it takes too long and crashes.
The Dataset is about 50~ MB in size
Is there any better way to read the data and form an edge list ?
My python code
import scipy.io as io
data=io.loadmat('realitymining.mat')
print data
Upvotes: 8
Views: 1615
Reputation: 29081
You could just save each field of the struct in a different text file, eg:
save('friends.txt', '-struct', 'network', 'friends', '-ascii')
and load each file separately from python
friends = numpy.loadtxt('friends.txt')
which loads instantly.
Upvotes: 1
Reputation: 2508
Maybe you can first work on part of he data as the network
in the struct, I have unpacked it here using MATLAB.
Still working on how to tidy up the rest bigger struct.
Upvotes: 0
Reputation: 231395
I can load it after unzipping. But it is stretching the memory.
When I try to load it with octave
I get:
octave:1> load realitymining.mat
error: memory exhausted or requested size too large for range of Octave's index type -- trying to return to prompt
In Ipython
In [10]: data.keys()
Out[10]: ['network', 's', '__version__', '__header__', '__globals__']
In [14]: data['__header__']
Out[14]: 'MATLAB 5.0 MAT-file, Platform: MACI, Created on: Tue Sep 29 20:13:23 2009'
In [15]: data['s'].shape
Out[15]: (1, 106)
In [17]: data['s'].dtype
Out[17]: dtype([('comm', 'O'), ('charge', 'O'), ('active', 'O'), ('logtimes', 'O'),...
('my_intros', 'O'), ('home_nights', 'O'), ('comm_local', 'O'), ('data_mat', 'O')])
# 58 fields
In [24]: data['s']['comm'][0,1].shape
Out[24]: (1, 30)
In [31]: data['s']['comm'][0,1][0,1]
Out[31]: ([[732338.8737731482]], [[355]], [[-1]], [u'Packet Data'], [u'Outgoing'],
[[40]], [[nan]])
In [33]: data['s']['comm'][0,1]['date']
Out[33]:
array([[array([[ 732338.86915509]]), array([[ 732338.87377315]]),
...
array([[ 732340.48579861]]), array([[ 732340.52778935]])]], dtype=object)
Look at the pieces. Simply trying to print data
or print data['s']
takes too long. Apparently it is just too big of structure to format quickly.
To practically get at this data, I'd suggest loading it once in Python or Matlab, and then save the useful pieces to one or more files.
Upvotes: 0