Ke Xu
Ke Xu

Reputation: 35

How to effectively store a very large list in python

Question:I have a big 3D image collection that i would like to store into one file. How should I effectively do it?

Background: The dataset has about 1,000 3D MRI images with a size of 256 by 256 by 156. To avoid frequent files open and close, I was trying to store all of them into one big list and export it.

So far I tried reading each MRI in as 3D numpy array and append it to a list. When i tried to save it using numpy.save, it consumed all my memory and exited with "Memory Error".
Here is the code i tried:

import numpy as np
import nibabel as nib
import os

file_list = os.listdir('path/to/files')

for file in file_list:
    mri = nib.load(os.path.join('path/to/files',file))
    mri_array = np.array(mri.dataobj)
data.append(mri_array)

np.save('imported.npy',data)

Expected Outcome:

Is there a better way to store such dataset without consuming too much memory?

Upvotes: 1

Views: 1365

Answers (1)

busybear
busybear

Reputation: 10590

Using HDF5 file format or Numpy's memmap are the two options that I would go to first if you want to jam all your data into one file. These options do not load all the data into memory.

Python has the h5py package to handle HDF5 files. These have a lot of features, and I would generally lean toward this option. It would look something like this:

import h5py

with h5py.File('data.h5') as h5file:
    for n, image in enumerate(mri_images):
        h5file[f'image{n}'] = image

memmap works with binary files, so not really feature rich at all. This would look something like:

import numpy as np

bin_file = np.memmap('data.bin', mode='w+', dtype=int, shape=(1000, 256, 256, 156))
for n, image in enumerate(mri_images):
    bin_file[n] = image
del bin_file    # dumps data to file

Upvotes: 2

Related Questions