Reputation: 3930
I'm reading data from 20k mat files into an array. After reading around 13k files, the process is ended with "Killed" message. Apparently, it looks like the problem is that too many files are open. I've tried to find out how to explicitly "close" mat files in Python, but didn't find any except for savemat which is not what I need in this case.
How can I explicitly close mat files in python?
import scipy.io
x=[]
with open('mat_list.txt','r') as f:
for l in f:
l=l.replace('\n','')
mat = scipy.io.loadmat(l)
x.append(mat['data'])
Upvotes: 4
Views: 1401
Reputation: 10298
You don't need to. loadmat
does not keep the file open. If given a file name, it loads the contents of the file into memory, then immediately closes it. You can use a file object like @nils-werner suggested, but you will get no benefit from doing so. You can see this from looking at the source code.
You are most likely running out of memory due to simply having too much data at a time. The first thing I would try is to load all the data into one big numpy array. You know the size of each file, and you know how many files there are, so you can pre-allocate an array of the right size and write the data to slices of that array. This will also tell you right away if this is a problem with your array size.
If you are still running out of memory, you will need another solution. A simple solution would be to use dask
. This allows you to create something that looks and acts like a numpy array, but lives in a file rather than in memory. This allows you to work with data sets too large to fit into memory. bcolz
and blaze
offer similar capabilities, although not as seamlessly.
If these are not an option, h5py
and pytables
allow you to store data sets to files incrementally rather than having to keep the whole thing in memory at once.
Overall, I think this question is a classic example of the XY Problem. It is generally much better to state your symptoms, and ask for help on those symptoms, rather than guessing what the solution is and asking for someone to help you implement the solution.
Upvotes: 4
Reputation: 36729
You can pass an open file handle to scipy.io.loadmat
:
import scipy.io
x=[]
with open('mat_list.txt','r') as f:
for l in f:
l=l.replace('\n','')
with open(l, 'r') as matfile:
mat = scipy.io.loadmat(matfile)
x.append(mat['data'])
leaving the with open()
context will then automatically close the file.
Upvotes: 1