Joseph
Joseph

Reputation: 353

Memory error pickle dump while saving/loading data from/into disk

l have a dataset of 40,000 examples dataset=(40.000,2048). After a process l would like to store and load dataset efficiently. Dataset is in an numpy format

l used pickle to store this dataset but it takes time to store and more time to load it. I even get memory error.

l tried to split the dataset into several sample as follow :

with open('dataset_10000.sav', 'wb') as handle:
    pickle.dump(train_frames[:10000], handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('dataset_20000.sav', 'wb') as handle:
    pickle.dump(train_frames[10000:20000], handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('dataset_30000.sav', 'wb') as handle:
    pickle.dump(train_frames[20000:30000], handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('dataset_35000.sav', 'wb') as handle:
    pickle.dump(train_frames[30000:35000], handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('dataset_40000.sav', 'wb') as handle:
    pickle.dump(train_frames[35000:], handle, protocol=pickle.HIGHEST_PROTOCOL)

However l get a memory error and its too heavy.

What is the best/optimized way to save/load such a huge data from/into disk ?

Upvotes: 0

Views: 4716

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96127

For numpy.ndarray objects, use numpy.save which you should prefer over pickle anyway, since it is more portable.It should be faster and require less memory in the serialization process.

You can then load it with numpy.load which even provides a memmap option, allowing you to work with arrays that are larger than can fit into memory.

Upvotes: 1

Related Questions