9879ypxkj
9879ypxkj

Reputation: 447

How to store HUGE python list as a file and then read the file as a list in python?

I'm doing a machine learning project, my dataset is composed of thousands of x-ray pictures, every time I want to work on this project I have to reload the pictures and pre-process them, which is very time-consuming so I want to read my images once and write the list of thousands of 224x224x3 matrices in a file that I can load everytime I need to work on this project.

I've already found some functions that allow me to write/read lists, but they don't seem to write the whole matrices but only a part:

This is the code I used to write the file:

with open(obj_dir +"train_data_p", "w") as file:
  file.write(str(train_data_p))

This is what I get when I open my training dataset file with notepad, as you can see from the "...," parts, it's showing only snippets of matrices:

[array([[[0.26666668, 0.26666668, 0.26666668],
        [0.32156864, 0.32156864, 0.32156864],
        [0.33333334, 0.33333334, 0.33333334],
        ...,
        [0.75686276, 0.75686276, 0.75686276],
        [0.77254903, 0.77254903, 0.77254903],
        [0.7764706 , 0.7764706 , 0.7764706 ]],
   [[0.27058825, 0.27058825, 0.27058825],
    [0.28627452, 0.28627452, 0.28627452],
    [0.31764707, 0.31764707, 0.31764707],
    ...,
    [0.7607843 , 0.7607843 , 0.7607843 ],
    [0.7647059 , 0.7647059 , 0.7647059 ],
    [0.8039216 , 0.8039216 , 0.8039216 ]],

   [[0.3019608 , 0.3019608 , 0.3019608 ],
    [0.34901962, 0.34901962, 0.34901962],
    [0.27058825, 0.27058825, 0.27058825],
    ...,
    [0.78431374, 0.78431374, 0.78431374],
    [0.7764706 , 0.7764706 , 0.7764706 ],
    [0.78431374, 0.78431374, 0.78431374]],

   ...,

   [[0.1254902 , 0.1254902 , 0.1254902 ],
    [0.1254902 , 0.1254902 , 0.1254902 ],
    [0.12156863, 0.12156863, 0.12156863],

How can I write/store the whole dataset so I don't have to read and process the images everytime? Help me please!

Upvotes: 0

Views: 139

Answers (3)

S.Au.Ra.B.H
S.Au.Ra.B.H

Reputation: 457

The reason that you are seeing ellipsis in the file is because you are writing str(train_data_p) to the file, and not actual train_data_p object.

As pointed by other answers, there are numerous packages that help storing large data. If you are using numpy, this answer may help you too.

Upvotes: 1

Ravi Satya Yenugula
Ravi Satya Yenugula

Reputation: 111

You can do it by numpy.save() and numpy.load() methods

import numpy as np
np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]]))
np.load('/tmp/123.npy')

Upvotes: 1

Alexandr Shurigin
Alexandr Shurigin

Reputation: 3981

You can serialize your data using builtin modules easy.

We have different options list:

Or any other 3rd party serialization package available in pip.

More about serialization https://en.wikipedia.org/wiki/Serialization

Upvotes: 0

Related Questions